Closed kevinrue closed 9 years ago
Implemented permutation-based P-values for GO terms.
Double ordering by p-value and (either ave_rank or ave_score) would be a mess to implement (the order function can take multiple fitlers, but then all of them need to be decreasing or increasing. This is a problem here as the score is ranked DECR and the p-value and rank are ranked INCR). Instead it is much easier to rank first by the tie-breaker, and then rank by the main filter. The result is the same.
No P-value on genes. The random forest algorithm does not provide inference-based statistics, and it would be very lengthy to apply the random forest algorithm on thousands on expression dataset with randomised sample group labels. With enough decision trees and variable sampled, the ranking of genes by importance should be trusted, without the need for p-values.
I just finished implementing a bootstrap function randomising the gene ranking/scoring. It is slow, but seems to give decent results. There are many ties because of the limited number of bootstrap iterations (especially for p-value of 0 and 1), therefore it could be a good idea to allow double sorting by p-value and break the ties using either rank or score. Also include a slot in the output object stating the number of iterations (to give crucial context to the p-values) Update subset_scores to allow filtering on p-value.