saeyslab / nichenetr

NicheNet: predict active ligand-target links between interacting cells
452 stars 113 forks source link

assess_rf_class_probabilities() and calculate_fraction_top_predicted_fisher() #193

Closed anidj closed 11 months ago

anidj commented 1 year ago

Hi,

In the tutorial "Assess how well top-ranked ligands can predict a gene set of interest", the function assess_rf_class_probabilities() is applied to predict whether a gene belongs to a pathway and we have to define a k value. How should we define k ? I got a warning when the length of the geneset or the length of background_expressed_genes is not a multiple of k.

Also, I get this error when I apply calculate_fraction_top_predicted_fisher(): Error in matrix(c(tp, fp, fn, tn), nrow = 2, dimnames = list(c("geneset", : length of 'dimnames' [2] not equal to array extent It seems, it is because there is no gene from the geneset that have a prediction greater or equal to the quantile of 0.95.

Best,

csangara commented 1 year ago

Hi,

k is the number of cross-validation folds you wish to have. I think this is more of a machine learning question and there are various resources you can look up to make your mind up on that. In general, k=5 seems to be a good start but there is not really a formal rule on it.

The warning is not a big issue, if k is a multiple of your geneset length then all the folds will be divided equally, if not then your some folds will just have fewer genes.

As for the error, there seems to be a bug with the way we calculated the contingency table (i.e., when no background genes have scores higher than the given quantile). I've updated the code of calculate_fraction_top_predicted_fisher so you can try updating the package to see if it fixes the error.

You can also try fixing it locally by replacing this line (the function is in R/application_prediction),

results_df = inner_join(all,predicted_positive, by = "response") %>% mutate(fraction_positive_predicted = positive_prediction/n)

to this


results_df = left_join(all, predicted_positive, by="response") %>% mutate(positive_prediction = replace_na(positive_prediction, 0))
``