ialbert / genescape-central

Gene Ontology subgraph visualizations
MIT License
17 stars 0 forks source link

Bug: duplicate inputs counted multiple times #8

Closed j-andrews7 closed 6 months ago

j-andrews7 commented 6 months ago

What it says on the tin. Slapping the same gene in multiple times results in the final count for each term being misleading:

image

Uniquify the inputs on submission before anything else is done.

ialbert commented 6 months ago

This is a not a bug but a feature :-)

though it is debatable whether it is appropriate and whether imposing uniqueness is better

The tool also accepts GO terms as input, so you could put in terms such as

GO:0016023

One can even mix symbols and GO terms in the same input list. Mixing a GO term can be useful if one wants to see a GO terms regardless of whether it is annotated in the input list, and wants to see where the genes fall relative to that term.

Then some enrichment tools, like G:Profiler may produce GO: terms as output

In those cases, the tool will add up the GO terms and produce counts according to how many times a term was present. Thus, it can be used to summarize GO terms that come off a list and lets the user visualize the frequency of those terms.

Thus, this behavior has utility and is not a mere oversight.

j-andrews7 commented 6 months ago

Hmm, perhaps a toggle then to enforce uniqueness? As a user, I don't think this is behavior I'd expect by default, so at least making it a little more clear in the documentation would be helpful.

ialbert commented 6 months ago

The behavior has utility so it has been kept as such, but a note is made in the docs about counting repeated elements.