wri / plantation_classifier

This research aims to spatially differentiate planted trees from natural trees using a transfer learning approach for image segmentation.
0 stars 0 forks source link

Backward selection killed due to segfault #9

Closed jessicarose00 closed 5 months ago

jessicarose00 commented 5 months ago

Subject of the issue

Unable to complete feature analysis with backward selection due to segmentation fault. My assumption is that the process is running out of memory. I'd like to confirm by running on another system. Some potential areas for concern are flagged directly in feature_selection.py.

Expected behaviour

stage_select_and_tune.py should perform backward selection and save a json of the top features.

Actual behaviour

When I trydvc exp run with select_features set to true and max_features set to 40, the program crashes with 53 features remaining and the following error: ERROR: failed to reproduce 'select_features_hyperparams': failed to run: python src/stage_select_and_tune.py --params=params.yaml, exited with -11.

When the script is run with python python3 src/stage_select_and_tune.py --params=params.yaml rather than using dvc, the program crashes with 65 features remaining and a slightly more informative error zsh: segmentation fault python3 src/stage_select_and_tune.py --params=params.yaml

I was able to successfully complete backward selection with a smaller number of max_features, specifically when set to 80. This gives me the impression we are dealing with a memory error.

jessicarose00 commented 5 months ago

Please see the v2 functions for an attempted fix.

jessicarose00 commented 5 months ago

Segfault is likely caused by a bug with shap explainer. Pivoted to use of feature importance as a metric for backward selection.