Closed laqua-stack closed 2 years ago
Hi!
I completely agree with you, that concatenation was unnecessary and in fact I believe .get_features_for_single_case()
should rather return a simpler dict
instead of a pandas.Series
.
I worked on it a bit and have right now merged the changes into the main. Take a look and let me know in case something can be further improved.
I'm sorry in advance that the API changed, i.e. num_threads
is now n_jobs
and is specified in the constructor, .run()
replaced extract_features()
.
Solved in #5
In the feature extraction, we concat a pd.DataFrame for every case. AFAIK this construction of a pd.DataFrame leads to a new memory allocation (and copying) every time, which is highly memory inefficient. Especially, when parallelized on many CPUs, combined with the already memory intensive forking in joblib this can lead to OOM-Events (and is slow of course). Wouldn't it be more convenient to return only the feature set, that is currently processed. https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L109-L115 These are subsequently collected in results anyways: https://github.com/pwoznicki/AutoRadiomics/blob/e475893c566de057d742f32da5cb9ece23a44eb0/autorad/feature_extraction/extractor.py#L135-L144