Open yeli7068 opened 2 years ago
The line plot requires x_col
to have sequential unbroken numbers, because the line plot draws a value for every site. The logo plot does not require this because it can break the axis to just show certain sites of interest.
The x_col
(or isite) column can just be any index that goes 1, 2, 3, ... so on. If you are using a protein that is already numbered that way, then it is just the site. But some proteins are no longer sequentially numbered. For instance, Omicron has some indels in the NTD but is still normally numbered using Wuhan-Hu-1 site numbering.
Dear Dr. Bloom,
I tried the line plot in dmslogo with toydata.csv. Errors say "not sequential unbroken integers".
Then I turned to the example. Even after reading the instruction, I still felt confused especially there was a gap between original and new in BG505_to_HXB2.csv (e.g. site: 141, 142l isite:142, 151).
What is "not sequential unbroken integers"? How to get the isite in SARS2?
Thx in advance.
Codes here:
OS: macOS Catalina 10.15.7 Python: 3.8.12 dmslogo: 0.6.2