ialbert / genescape-central

Gene Ontology subgraph visualizations
MIT License
13 stars 0 forks source link

Stress test min and max number of genes #14

Closed sridhar0605 closed 3 months ago

sridhar0605 commented 3 months ago

Please describe max and min number of genes that the user needs to input.

ialbert commented 3 months ago

the minimum number of genes is 1 but the max is a bit tricky, since you only know that you have too many hits once the graph is complete.

I should make it clear in some way that this approach was not intended for putting in 1000s of genes and looking at unfiltered results.

The challenge is on how to automatically figure out the default mincount, even huge genelist can be easily processed if one sets a minimum count (that is how many genes must share a function).

Maybe setting this at say 10% (default mincout is 10% of the genelist) will massively reduce the tree size. I dislike the idea of setting tacit parameters to avoid misleading the end user.

So there is no maximal gene list, but large number genes must be filtered if the underlying graph explodes in size.

This tool is really good at showing people gene ontologies actually represent biological knowledge.

sridhar0605 commented 3 months ago

If we take a step back and understand how people will be generating gene lists to input in GeneScape, AFAIK they would be coming up with gene lists either from Gene expression - DEG or chip seq occupancy data etc, on an average these lists would be at least in 100s if not 1000s.

I think its fair enough to just mention in docs something on the lines of "Hey we can process X rows of gene list" but below are the things to consider which huge lists.

  1. User may need to either use pattern match to make sense of things.
  2. please toggle mincount to make sense of the data, since there is no consensus on an optimal min-count.
  3. the resulting tree is going to be huge it might might break things w.r.t saving plots etc..?

I agree the tool is good, highlighting open ended limitations is no way going to undermine the main deliverables of this tool.

ialbert commented 3 months ago

This issue has the same root as the issue reported here:

https://github.com/ialbert/genescape-central/issues/15

graphs can get very large and that point generating the graphs pushes the limits of not my software but that of other libraries I am relying on.

Filtering by words or minimum count can quickly collapse these graphs by removing the clutter. So the problem is more about documenting the limitations.

I have updated the documentation with more details added into the section titled: Reducing the tree size

https://github.com/ialbert/genescape-central/tree/main?tab=readme-ov-file#reducing-the-tree-size

ialbert commented 3 months ago

Fixed in version 0.9.4., in the documentation.