Closed briki1234 closed 11 months ago
Hi @briki1234! I haven't seen this one before. This error happens when reading in all the files in the project_folder/csv/targets_inserted
directory. Before starting to train the classifier, a check is run to make sure that all the columns and rows contain values. However, SimBA seems unable to run this check, and for some reason, you end up with a single field of datetime values.
The data in the project_folder/csv/targets_inserted
directory not look very large, is there any chance you could share it and I can take a look?
Actually there are 4.41GB of files in this folder(80 files) - even uploading 1 file is failed (50-70 MB each) The issue is that 2 classifier in this project finished training successfuly, but in the third(the error that I'm talking about) the error occured.
Got it, is there anything odd with the third classifier annotation column? Is there any way this column could have been mistaken as a datetime column in any of the files in the project_folder/csv/targets_inserted
directory?, e.g. could this column in any of the files have been mistakenly transformed to date format, or has any value sneaked in that is not a 0
or a 1
?
PS. If you need it, I have a python script somewhere that could help
PS. If you need it, I have a python script somewhere that could help
I would love to get it and check by it the issue.
Open this file and edit two rows near top. Change the DATA_DIRECTORY
to be the full path to your project_folder/csv/targets_inserted
directory, and the CLASSIFIER_NAME
to be the name of your classifier.
In your SimBA environment, navigate to the folder you store the file, and run python catch_error_annotation_field.py
. Let me know what you see printed out, it should print out an error for any odd values it find and which files the errors are found in:
"COMPLETE: 0 error(s) found"
It's seems like all files are valid.
i see, what is the classifier name? Coul there be any oddities in in the classifier name, hyphens etc, the code struggles with?
the classifier name is "On_Restrainer_Half", I already trained successfuly the classifier "On_Restrainer_Full" so shouln't be any oddities
Thanks @briki1234 - my guess then, considering data size, that it is a memory error as it says. How much RAM do you have on the machine?
16GB of RAM but it is using the same RAM and memory as in the other classifiers I trained, which is a bit strange.
Yes. Perhaps, if you train one or two classifiers, data is read into memory for those classifiers and not completely cleared, or some other processes are not terminated completely. You get to the third classifier, and you then hit the 16gb threshold. If you kill all python processes, or restart if possible, and only train the third classifier, does it still fail?
It's a good question, I will try and update!
It worked and the training worked till the end when it is written "Saving model meta data file..." but in the cmd an error appeared:
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\simba-test\lib\tkinter__init.py", line 1705, in call__
return self.func(*args)
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\SimBA.py", line 365, in
Notice that I have more than 100GB of memory available.
Do you recognize this error?
@briki1234 This is still a RAM issue still.
When converting the model to something that can be stored on the harddrive, available RAM runs out. There are a few settings that affect the size of a model, the biggest probably being the number of estimators or trees. Often, using say 500 trees won't effect performance to much relative to 2k trees but will save you a lot of space and memory. See if you can get it working with fewer estimators.
Not sure why this model is more difficult to fit in memory than the others. If the same number of estimators, it could be related to amount of data: if the other models have greater undersampling.
I used 500 estimators and the error message didn't appear! I appriciate your help, thank you!
Thanks for letting me know!
When trying to train the classifier i get error: Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\simba-test\lib\tkinter__init.py", line 1705, in call return self.func(*args) File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\SimBA.py", line 365, in
button_trainmachinemodel = Button(label_trainmachinemodel,text='TRAIN SINGLE MODEL (GLOBAL ENVIRONMENT)',fg='blue',command = lambda: threading.Thread(target=self.train_single_model(config_path=self.config_path)).start())
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\SimBA.py", line 580, in train_single_model
model_trainer = TrainRandomForestClassifier(config_path=config_path)
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\model\train_rf.py", line 65, in init__
self.data_df = self.check_raw_dataset_integrity(df=self.data_df, logs_path=self.logs_path)
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\simba\mixins\train_model_mixin.py", line 1144, in check_raw_dataset_integrity
nan_cols = df.reset_index(drop=True).replace([np.inf, -np.inf, None], np.nan).columns[df.isna().any()].tolist()
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\frame.py", line 4278, in replace
method=method,
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\generic.py", line 6741, in replace
to_replace=to_replace, value=value, inplace=inplace, regex=regex
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\managers.py", line 588, in replace
return self.apply("replace", value=value, kwargs)
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\managers.py", line 438, in apply
applied = getattr(b, f)(kwargs)
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 804, in replace
convert=convert,
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 2953, in replace
regex=regex,
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 3024, in _replace_single
to_replace, value, inplace=inplace, filter=filter, regex=regex
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 832, in replace
b.convert(by_item=True, numeric=False, copy=not inplace) for b in blocks
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 832, in
b.convert(by_item=True, numeric=False, copy=not inplace) for b in blocks
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 2840, in convert
blocks = self.split_and_operate(None, f, False)
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 493, in split_and_operate
nv = f(m, v, i)
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\internals\blocks.py", line 2831, in f
values = fn(v.ravel(), fn_kwargs)
File "C:\ProgramData\anaconda3\envs\simba-test\lib\site-packages\pandas\core\dtypes\cast.py", line 846, in soft_convert_objects
values = lib.maybe_convert_objects(values, convert_datetime=datetime)
File "pandas/_libs/lib.pyx", line 1990, in pandas._libs.lib.maybe_convert_objects
MemoryError: Unable to allocate 10.7 MiB for an array with shape (1397128,) and data type datetime64[ns]
Exception in Tkinter callback**
Desktop
Additional context I am using 64-bit version of python, and have enough memory space in my computer.