soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
529 stars 133 forks source link

[HHblits] The output first sequence length(32763) is not equal to query sequence length(34350) #278

Closed Licko0909 closed 3 years ago

Licko0909 commented 3 years ago

Current Behavior

1) [HHblits] The output first sequence length is not equal to query sequence length. image 2) and many MSAs seqence have many gap likes ‘----’. image image

Steps to Reproduce (for bugs)

---Command line Command execution: hhblits -i uc031rqd.1.seq -cpu 16 -n 1 -d /data/hh-suite/UniRef30_2020_06_hhsuite/UniRef30_2020_06 -oa3m uc031rqd.1.a3m -maxres 35000

HH-suite Output (for bugs)

Not any error mentioned and I can get the output file. image

Context and Environment

---sequence file: uc031rqd.1.seq in here

---Versison: HH-suite version = 3.3.0

---Server: System: Ubuntu 18.04 Memory: 504G image

milot-mirdita commented 3 years ago

That looks pretty much like expected. There are two things to keep in mind: 1) HHblits is a local aligner, so its likely that only sequence chunks will be aligned to the query (You can lower the -mact parameter to get more global, but less precise alignments. See https://github.com/soedinglab/hh-suite/wiki#what-does-the-maximum-accuracy-alignment-algorithm-do) 2) The default output format is not aligned FASTA but a3m. See a description here: https://github.com/soedinglab/hh-suite/wiki#multiple-sequence-alignment-formats. A3M saves a lot of space by placing a lower case letter in case of an insert state and omitting gaps in other sequences. You can write results in standard aligned FASTA with -Ofas or use the reformat.pl script to convert them between formats.