Open lavibig opened 1 year ago
This should work, however the output format here is a3m, which introduces lower-case letters for positions where gaps should be in all other sequences. This reduces the file-size of MSAs tremendously, however might be confusing if you have never seen it before.
You can either drop the lower-case letters with something like the following:
awk '/^>/ { print; next; } { gsub(/[a-z]/, "", $0); print; }' asd.a3m
Or use --msa-format-mode 2
to generate aligned FASTA.
You can also use the reformat.pl script from HHsuite to convert from a3m to fasta.
Thanks for your answer. I tried the refomat.pl script. Worked nicely. A related question remains: Using this approach, will the result be a multiple sequence alignment or a multiple structure alignment?
The result is a multiple amino acid sequence alignment. However, the alignment was done with Foldseek, thus 3DI (structural) and AA information were used (and TMalign information in TM mode).
The result is also a query centric MSA. We are developing a different tool for MSAs of full length aligned structures. We hope to release a preprint for that tool soon.
Got it. Thanks!
Dear Team, Thanks for this amazing tool. I'm trying to generate a Multiple Structure Alignment from a set of pdbs. I understand that foldseek runs a pairwise structural alignment during the search combining 3Di+sequence. My question is how to generate a Multiple Structure Alignment. I tried the following procedure, suggested in git:
foldseek createdb example/ targetDB foldseek createdb example/ queryDB foldseek search queryDB targetDB aln tmpFolder -a foldseek result2msa queryDB targetDB aln msa --msa-format-mode 6 foldseek unpackdb msa msa_output --unpack-suffix a3m --unpack-name-mode 0
The problem is, that the sequences in the resulting .a3m files do not look aligned. Am I missing something? Is there a more straight forward way to generate a multiple structure alignment using foldseek?
Lavi