Open RafaD5 opened 3 years ago
Hey @RafaD5! Looks like a bug. I'm pretty sure that data between different folds and models should be cleared. Do you observe the same behavior for the Compete
mode? You can set validation_strategy={"validation_type" "kfold", "k_folds": 5, "shuffle": True}
to have the same CV as in Perform
mode.
@pplonski I met the same situation several times. There is a memory leak.
@xuzhang5788 was it for Perform
mode or other?
Compete and Optuna mode. My case is like, in one notebook, if I noticed that the memory accumulated after I ran several automl.fit, then the kernal got killed. I have to restart my kernal every new training.
@xuzhang5788 thank you, I will work on it. Any help appreciated! :)
@RafaD5 @xuzhang5788 I made few changes:
del
statements on datasets and gc.collect()
.del
.All changes are in the dev
branch. You can install it:
pip install -q -U git+https://github.com/mljar/mljar-supervised.git@dev
I'm looking for your feedback! Thank you!
It looks like that it was not improved a lot. I still can see that the memory was occupied gradually.
@xuzhang5788 yes, it is not fixed 100%. It should be slightly better and maybe not cause crashes. Looks like algorithms not from sklearn package doesn't release memory properly.
I will try to run ML training in separate processes, maybe this will help, but on the other hand I dont want to make over-complex code.
This is still an issue, correct? I'm curious since I've been tinkering with mljar for numerai competitions. Seem to run out of memory - would run for 14 hours overnight and wake up to stalled computer (I have 64GB)
@BrickFrog yes, it is still an issue.
@BrickFrog have you used custom eval_metric
when using AutoML on numerai data? It is possible to pass custom eval_metric
like sharpe ratio to be optimized. There is also Spearman correlation built-in as eval_metric
in MLJAR. Sorry if you couldnt find it in the docs. Please add github issue and I will fix the docs.
It is also possible to set-up custom validation strategy, by passing defined train/validation indices for each fold.
I have a plan to add tutorial/examples how mljar-supervised
can be used with numerai data.
What is more, we are working on visual-notebook. It will be a desktop application for data science where user can click-out the solution, without heavy coding. I attach the screenshot (very development version). I would add blocks for numerai there (get latest data, upload submission).
while using "Compete"mode similar issues is still being faced While using "AutoML_class_obj = AutoML(data=data,mode ="Compete",eval_metric = "r2") using in compete mode with around 9998 training samples/records.Either it is getting crashed or it goes on with too many python programs running in task manager. 1.UserWarning:MiniBatchKMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can prevent it by setting batch_size >= 9216 or by setting the environment variable OMP_NUM_THREADS=1 2.OSError: [WinError 1455] The paging file is too small for this operation to complete
@sumanttyagi thank you for reporting. I understand that you are on the Windows system. Could you please post the full code with the data sample to reproduce the issue? Is it possible?
Hi,
I've trained several models with mode="Perform" and when the training gets to certain point the python execution is killed because of the memory usage (I'm using a computer with 16 GB). What I do is to rerun the script and change the model_name to the name of the model just created to resume training. A couple of times I've had to repeat this process twice. It is not due to a single model but to data from previous models (already trained) that is not eliminated from memory.