Closed cx994 closed 1 year ago
MSA without gaps cannot be called MSA because it lacks alignment information, right? As the package name suggests, pyMSAviz is a tool to visualize MSA, so I do not plan to implement any function to handle non-MSAs.
Sorry if I have misunderstood the meaning of your proposal.
Sorry, I may not have made it clear~ As shown in the figure below, all amino acid sequences are gaps at some sites So is it possible to omit these sites but keep the position information to get a more concise MSA visualization? I think it can preserves valid information and reduces drawing time.
Are you saying that if there is a gap-only position in the MSA, you want to determine that position as unnecessary and exclude it from the visualization?
Personally, I don't quite understand the effectiveness of the proposed functionality, as it seems to me that there are very few cases (or there shouldn't be any) where a gap-only position is included in the alignment results.
Could you please tell me the following to help me understand?
If I have misunderstood something, I am sorry.
I have spent some time thinking about how to handle this issue.
It is not realistic to exclude gap-only positions one by one, as it would also shift the xticklabel and would not represent the proper visualization results.
Personally, I think it would be reasonable to add an option to automatically exclude areas containing only gaps from the visualization on a MSA Wrap Block
basis.
Below is an experimental implementation (add ignore_all_gaps
option) of the visualization demo.
from pymsaviz import MsaViz
from Bio.Align import MultipleSeqAlignment
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
gap_num = 50
test_msa = MultipleSeqAlignment(
[
SeqRecord(Seq("M-AT----ALLCRGRI" + "-" * gap_num + "AITFR---RGRI--"), id="01"),
SeqRecord(Seq("M-TI-------TRGVI" + "-" * gap_num + "AITFR---RGRI--"), id="02"),
]
)
mv = MsaViz(test_msa, wrap_length=30, show_grid=True)
mv.set_plot_params(ignore_all_gaps=True, ticks_interval=5) # <= Newly added!!
fig = mv.plotfig()
Option: ignore_all_gaps=False => Gap-only MSA wrap block exist
Option: ignore_all_gaps=True => No gap-only MSA wrap block
I think this is a realistic and easy implementation. What do you think?
Also, this is just a personal interest question, but in what situations or tools is sparse MSA
generated? I don't see it in common multiple alignment tools like muscle
or mafft
, so can you tell me for reference?
Oh, great! I think it will solve my problem to some extent. I've tried to exclude gap-only positions one by one but found it's really cumbersome if I want to keep true xticklabel~
Besides, I don't quite understand why there are sparse MSA results. But in the results downloaded from the below database, most of the MSA file
are sparse!
TreeFam database
All in all, thank you for your kind help! I will continue to think about how to solve this problem in my spare time :)
To add, I think there is a convenient way:
I did some checking on TreeFam.
Your MSA is based on extracting some data from the MSA of 400 TRK genes, correct? If so, it is not surprising that the gap-only positions are included.
If you are interested only in the extracted gene sequences, I suggest you remove the gaps from the extracted sequences by yourself and align them again with maftt
or muscle
. You will get more accurate alignment results that way.
If you don't necessarily need to rely on TreeFam alignment results, it seems to me that people generally process their data that way.
Also, if you do that, you will not have the problem you presented here.
These are my personal opinions. It may be superfluous, but I hope it will be helpful.
Gap-only sites in MSAs are essentially never entered in normal operation. Even if a gap-only site were to exist for some reason, it would not be considered meaningful for data analysis and should be removed in the preprocessing stage of visualization.
Therefore, I shall consider not to implement processing for gap-only sites in pyMSAviz.
I think it is necessary to provide a function to draw msa without gaps