Error message while trying to train a classifier refer to the memory

briki1234 commented 12 months ago

When trying to train the classifier i get error: Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\simba-test\lib\tkinter__init.py", line 1705, in call return self.func(*args) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\SimBA.py", line 365, in button_trainmachinemodel = Button(label_trainmachinemodel,text='TRAIN SINGLE MODEL (GLOBAL ENVIRONMENT)',fg='blue',command = lambda: threading.Thread(target=self.train_single_model(config_path=self.config_path)).start()) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\SimBA.py", line 580, in train_single_model model_trainer = TrainRandomForestClassifier(config_path=config_path) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\model\train_rf.py", line 65, in init__ self.data_df = self.check_raw_dataset_integrity(df=self.data_df, logs_path=self.logs_path) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\mixins\train_model_mixin.py", line 1144, in check_raw_dataset_integrity nan_cols = df.reset_index(drop=True).replace([np.inf, -np.inf, None], np.nan).columns[df.isna().any()].tolist() File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\frame.py", line 4278, in replace method=method, File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\generic.py", line 6741, in replace to_replace=to_replace, value=value, inplace=inplace, regex=regex File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\managers.py", line 588, in replace return self.apply("replace", value=value, kwargs) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\managers.py", line 438, in apply applied = getattr(b, f)(kwargs) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 804, in replace convert=convert, File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 2953, in replace regex=regex, File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 3024, in _replace_single to_replace, value, inplace=inplace, filter=filter, regex=regex File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 832, in replace b.convert(by_item=True, numeric=False, copy=not inplace) for b in blocks File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 832, in b.convert(by_item=True, numeric=False, copy=not inplace) for b in blocks File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 2840, in convert blocks = self.split_and_operate(None, f, False) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 493, in split_and_operate nv = f(m, v, i) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 2831, in f values = fn(v.ravel(), fn_kwargs) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\dtypes\cast.py", line 846, in soft_convert_objects values = lib.maybe_convert_objects(values, convert_datetime=datetime) File "pandas/_libs/lib.pyx", line 1990, in pandas._libs.lib.maybe_convert_objects MemoryError: Unable to allocate 10.7 MiB for an array with shape (1397128,) and data type datetime64[ns] Exception in Tkinter callback**

Desktop

OS: Windows
Python Version 3.6.13
using anaconda

Additional context I am using 64-bit version of python, and have enough memory space in my computer.

sronilsson commented 11 months ago

Hi @briki1234! I haven't seen this one before. This error happens when reading in all the files in the project_folder/csv/targets_inserted directory. Before starting to train the classifier, a check is run to make sure that all the columns and rows contain values. However, SimBA seems unable to run this check, and for some reason, you end up with a single field of datetime values.

The data in the project_folder/csv/targets_inserted directory not look very large, is there any chance you could share it and I can take a look?

briki1234 commented 11 months ago

Actually there are 4.41GB of files in this folder(80 files) - even uploading 1 file is failed (50-70 MB each) The issue is that 2 classifier in this project finished training successfuly, but in the third(the error that I'm talking about) the error occured.

sronilsson commented 11 months ago

Got it, is there anything odd with the third classifier annotation column? Is there any way this column could have been mistaken as a datetime column in any of the files in the project_folder/csv/targets_inserted directory?, e.g. could this column in any of the files have been mistakenly transformed to date format, or has any value sneaked in that is not a 0 or a 1?

sronilsson commented 11 months ago

PS. If you need it, I have a python script somewhere that could help

briki1234 commented 11 months ago

PS. If you need it, I have a python script somewhere that could help

I would love to get it and check by it the issue.

sronilsson commented 11 months ago

Open this file and edit two rows near top. Change the DATA_DIRECTORY to be the full path to your project_folder/csv/targets_inserted directory, and the CLASSIFIER_NAME to be the name of your classifier.

In your SimBA environment, navigate to the folder you store the file, and run python catch_error_annotation_field.py. Let me know what you see printed out, it should print out an error for any odd values it find and which files the errors are found in:

catch_error_annotation_field.py.zip

briki1234 commented 11 months ago

"COMPLETE: 0 error(s) found"

It's seems like all files are valid.

sronilsson commented 11 months ago

i see, what is the classifier name? Coul there be any oddities in in the classifier name, hyphens etc, the code struggles with?

briki1234 commented 11 months ago

the classifier name is "On_Restrainer_Half", I already trained successfuly the classifier "On_Restrainer_Full" so shouln't be any oddities

sronilsson commented 11 months ago

Thanks @briki1234 - my guess then, considering data size, that it is a memory error as it says. How much RAM do you have on the machine?

briki1234 commented 11 months ago

16GB of RAM but it is using the same RAM and memory as in the other classifiers I trained, which is a bit strange.

sronilsson commented 11 months ago

Yes. Perhaps, if you train one or two classifiers, data is read into memory for those classifiers and not completely cleared, or some other processes are not terminated completely. You get to the third classifier, and you then hit the 16gb threshold. If you kill all python processes, or restart if possible, and only train the third classifier, does it still fail?

briki1234 commented 11 months ago

It's a good question, I will try and update!

briki1234 commented 11 months ago

It worked and the training worked till the end when it is written "Saving model meta data file..." but in the cmd an error appeared: Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\simba-test\lib\tkinter__init.py", line 1705, in call__ return self.func(*args) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\SimBA.py", line 365, in button_trainmachinemodel = Button(label_trainmachinemodel,text='TRAIN SINGLE MODEL (GLOBAL ENVIRONMENT)',fg='blue',command = lambda: threading.Thread(target=self.train_single_model(config_path=self.config_path)).start()) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\SimBA.py", line 583, in train_single_model model_trainer.save_model() File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\model\train_rf.py", line 236, in save_model self.save_rf_model(self.rf_clf, self.clf_name, self.model_dir_out) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\mixins\train_model_mixin.py", line 729, in save_rf_model pickle.dump(rf_clf, open(save_path, 'wb')) MemoryError

Notice that I have more than 100GB of memory available.

Do you recognize this error?

sronilsson commented 11 months ago

@briki1234 This is still a RAM issue still.

When converting the model to something that can be stored on the harddrive, available RAM runs out. There are a few settings that affect the size of a model, the biggest probably being the number of estimators or trees. Often, using say 500 trees won't effect performance to much relative to 2k trees but will save you a lot of space and memory. See if you can get it working with fewer estimators.

Not sure why this model is more difficult to fit in memory than the others. If the same number of estimators, it could be related to amount of data: if the other models have greater undersampling.

briki1234 commented 11 months ago

I used 500 estimators and the error message didn't appear! I appriciate your help, thank you!

sronilsson commented 11 months ago

Thanks for letting me know!

sgoldenlab / simba

Error message while trying to train a classifier refer to the memory #273