sanderlab / alignmentviewer

Multiple sequence alignment visualizer
http://alignmentviewer.org
MIT License
27 stars 6 forks source link

Describe Limitations for UMAP Plot #46

Open cannin opened 4 years ago

cannin commented 4 years ago

@MercifulCode Are there limitations to the annotations that are displayed in the UMAP plot? I think HUMAN does not appear for these example files:

https://github.com/dfci/alignmentviewer/tree/master/example

drew-diamantoukos commented 4 years ago

I think the example annotation file is a bit out of date - there should not be a header, and the last character for each sequence name is missing.

That said, after digging some more into the bioblocks-viz code, I found that indeed there is a limitation to the number of annotations displayed!

Internally, UMAPSequenceContainerClass sets the colors associated with each label using this a predefined array of colors. If there are more labels than colors, the remaining labels are classified as "unannotated."

Going forward, I see a few possible solutions:

  1. Allowing the set of colors to use to be passed to the component as well.
  2. A flag to re-use colors if the limit is hit.
  3. In the case of too many labels, renaming "Unnannotated" to something better.
  4. Paginating the labels - for example, if there's 40 unique labels but only 10 colors, we'd have 4 pages of labels.

I'm leaning towards both options 1 & 2, as that way all labels can be viewable at once. That said, if we increase the number of labels to display, the labels would need to be placed inside a scrollable list to prevent overflow.

What do you think?

cannin commented 4 years ago

@MercifulCode we talked a bit yesterday. Some of the suggestions were: