Closed sridhar0605 closed 6 months ago
the minimum number of genes is 1 but the max is a bit tricky, since you only know that you have too many hits once the graph is complete.
I should make it clear in some way that this approach was not intended for putting in 1000s of genes and looking at unfiltered results.
The challenge is on how to automatically figure out the default mincount, even huge genelist can be easily processed if one sets a minimum count (that is how many genes must share a function).
Maybe setting this at say 10% (default mincout is 10% of the genelist) will massively reduce the tree size. I dislike the idea of setting tacit parameters to avoid misleading the end user.
So there is no maximal gene list, but large number genes must be filtered if the underlying graph explodes in size.
This tool is really good at showing people gene ontologies actually represent biological knowledge.
If we take a step back and understand how people will be generating gene lists to input in GeneScape, AFAIK they would be coming up with gene lists either from Gene expression - DEG or chip seq occupancy data etc, on an average these lists would be at least in 100s if not 1000s.
I think its fair enough to just mention in docs something on the lines of "Hey we can process X rows of gene list" but below are the things to consider which huge lists.
I agree the tool is good, highlighting open ended limitations is no way going to undermine the main deliverables of this tool.
This issue has the same root as the issue reported here:
https://github.com/ialbert/genescape-central/issues/15
graphs can get very large and that point generating the graphs pushes the limits of not my software but that of other libraries I am relying on.
Filtering by words or minimum count can quickly collapse these graphs by removing the clutter. So the problem is more about documenting the limitations.
I have updated the documentation with more details added into the section titled: Reducing the tree size
https://github.com/ialbert/genescape-central/tree/main?tab=readme-ov-file#reducing-the-tree-size
Fixed in version 0.9.4., in the documentation.
Please describe max and min number of genes that the user needs to input.