Open AndreaGuarracino opened 2 years ago
I don't fully get your idea. Do all the aligned sequences have flanking nucleotides? Or only a part of the sequences have?
Then, I guess what you want is to obtain the alignment/consensus for only a subset of all the input aligned sequences that have no flanking nucleotides, am I right?
Yan (I know Yao is very famous:) )
All the aligned sequences have flanking nucleotides.
Let me try with an example.
These are the original sequences.
example_abpoa.fa
>Sequence1
AAAAAAAATTTT
>Sequence2
TTGGTTTTTT
abpoa --match 1 --mismatch 9 --gap-open 16,41 --gap-ext 2,1 example_abpoa.fa -r 2
>Multiple_sequence_alignment
------AAAAAAAATTTT
TTGGTT--------TTTT
------AAAAAAAATTTT
Instead of aligning the original sequences, I am extending them with few flanking nucleotides upstream and downstream, and then aligning the expanded sequences. For example, adding 4 flanking nucleotides:
example_abpoa_padded.fa
>Sequence1
GGGGAAAAAAAATTTTGGGG
>Sequence2
TTTTTTGGTTTTTTGGGG
abpoa --match 1 --mismatch 9 --gap-open 16,41 --gap-ext 2,1 example_abpoa_padded.fa -r 2
>Multiple_sequence_alignment
------GGGGAAAAAAAATTTTGGGG
TTTTTTGG--------TTTTTTGGGG
TTTTTTGGGGAAAAAAAATTTTGGGG
Concerning the sequences, I am taking this alignment and removing the added flanking nucleotides. However, regarding the consensus sequence, I can't just trim the flanking nucleotides out (this leads to rare, but blocking bugs).
I would like to trim the expanded sequences in the abPOA graph (where the alignment was already computed with the expanded sequences) and then re-ask for the consensus sequence generation using the trimmed graph.
Yan (I know Yao is very famous:) )
LOL, mutations, you've got me!
>Multiple_sequence_alignment
------GGGGAAAAAAAATTTTGGGG
TTTTTTGG--------TTTTTTGGGG
TTTTTTGGGGAAAAAAAATTTTGGGG
I guess, from this new graph, what you want is the consensus of the original (unexpanded) sequences, right? Then the graph should be transformed to:
>Multiple_sequence_alignment
----------AAAAAAAATTTT----
----TTGG--------TTTTTT----
>Consensus_sequence
----TTGG--AAAAAAAATTTT----
I think this is possible. We need to record all the flanking-base nodes and then manipulate their weight/base before calling the consensus:
-
if w == 1Edit: this manipulation may not be correct, I need to think about it.
Then for the final consensus, we just need to remove all the -
symbols.
Hi @yangao07, any update on this?
See above, is that what you want?
I apologize, I didn't realize you were waiting for confirmation from me. Yes, I want the consensus of the original (unexpanded) sequences, as you wrote.
>Multiple_sequence_alignment
----------AAAAAAAATTTT----
----TTGG--------TTTTTT----
>Consensus_sequence
----TTGG--AAAAAAAATTTT---- <---- This one, generated from the trimmed (unexpanded) sequences
As I said, the consensus sequence can't be simply trimmed (this leads to rare bugs) but has to be generated starting from the original (unexpanded) sequences, using the alignment that we've got when we aligned the same sequences together with the flanking nucleotides.
After your considerations, do you think is it still doable?
Hi Ya
on,I've expanded few sequences with flanking nucleotides, then I've aligned them, obtaining the MSA and consensus sequence.
However, after aligning the expanded sequences, I would like to first delete the additional nucleotides and then calculate the consensus sequence with only the unexpanded sequences, without having to recalculate the whole alignment.
Is it possible?