related-sciences / nxontology-ml

Machine learning to classify ontology nodes
Apache License 2.0
6 stars 0 forks source link

Add subsets features #27

Closed yonromai closed 11 months ago

yonromai commented 11 months ago

This PR adds subset features logic + experiment. More context about these features can be found in the issue "Include EFO subsets as features for model" #14.

The updated notebook can be found here (added the pca64_subsets_mae experiment).

Gist: Adding the subset features hasn't changed much, the performance is in the same ballpark as all the high performing models

=> I suspect that some additional model tuning (iterations, learning rate & regularization) might impact the models with a lot of features a little. My plan is to add the GPT-4 tags as feature and then hopefully run some experiments overnight to tune these parameters - time permitting.

cc @eric-czech

yonromai commented 11 months ago

Don't think this PR shows the feature importance of the subset-based features, but okay to wait till later to see that.

Indeed the feature importance work isn't done yet - I'm trying to get (1) the subset features + (2) the GPT-4 features merged in main then (3) re-run the experiments. Then I'll need to add code for (4) feature importance and finally (5) run inference on the whole ontology for whichever model we choose. Finally need to work on the slides 😅

yonromai commented 11 months ago

(Will merge to unblock upcoming work, @dhimmel feel free to add comments to this PR)