nch-igm / snvstory

Rapid and accurate ancestry inference using SNVs.
BSD 3-Clause "New" or "Revised" License
15 stars 8 forks source link

Feature importance? #13

Open timchu90 opened 8 months ago

timchu90 commented 8 months ago

Hi, the SNVstory paper has this line: “SNVstory includes a feature-importance scheme, unique among open-source ancestral tools, which allows the user to track the ancestral signal broadcast by a given gene or locus.” I can't seem to find how to do this with the snvstory package. is there a separate package for this gene/locus analysis? Thanks!

audrey-bollas commented 8 months ago

Hi! The feature importance is an analysis the user can do with their input data and the gnomad model provided in the resource directory. Since they are built with xgboost you can use SHAP Tree Explainer to do this. In our paper we highlighted a gene-based and a cytolocation-based feature importance analysis. This is not part of the Docker package but I will certainly add the scripts and package environment for you to implement this. I’ll get this up in a day or two. Thanks!

audrey-bollas commented 7 months ago

Okay, I've added scripts to do the analysis we performed in the paper. The program should export the gene/locus features as a numpy array for you to work with. It can optionally generate the plots from the paper as well. Let me know if you get it working. :)