informatics-lab / precip_rediagnosis

Project to use ML to re-diagnose precipitation fields from ensemble model fields
0 stars 0 forks source link

Adding notebook with permutation importance examples #27

Closed hannahbrown7 closed 2 years ago

hannahbrown7 commented 2 years ago

This notebook contains examples of how to calculate Breiman (2001) and Lakshmanan (2015) interpretations of permutation importance. Breiman PI works with models which are both trained on only vertical profile features and those trained on a mix of vertical profile and single level features. Lakshmanan PI currently only works with models which is trained on vertical profile features.

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

stevehadd commented 2 years ago

That looks good to me. Some thoughts questions:

hannahbrown7 commented 2 years ago

Having looked it up, general advice is towards using tf.keras rather than keras, so have updated the code to use tf.keras in cell24

hannahbrown7 commented 2 years ago

Regarding how multilevel features are treated during permutation importances: currently, multi-level features are permuted based on their order, the structure of the height levels within the profile is maintained. However, I agree that it will be really interesting to dig down assess the impact of the different height levels. From the data exploration we saw that there was quite a lot of correlation between height levels that were near each other and one of the limitations of permutation importance is if the feature that is permuted are highly correlated with another feature then it appears to have little impact as the model gets the information from the other correlated feature making the permuted feature look less important than it perhaps actually is. So may need a bit of consideration as to whether to take a different approach or just careful interpretation.

hannahbrown7 commented 2 years ago

Thank you for flagging about different permutation importance plots being created! A random seed is set for splitting the data, so I don't think that will be a factor. It may be partly down to the lack of random seed in the ML model, which my understanding results in as you say different network weight initialisations. Also as it is a small dataset and the features are being randomly permuted, this may cause some variation - probably a combination of factors. While it does seem to vary quite a lot, and concerning that the features don't always seem to be in the same order.

hannahbrown7 commented 2 years ago

Have removed cells at the bottom of the notebook which are not relevant to the feature importance assessment

hannahbrown7 commented 2 years ago

Agree assessments of different regimes and/or weather events will be interesting