Closed kylemeador closed 2 years ago
reformat.pl
takes the description from the first sequence in the MSA (in this case 3qv0_1
). More accuratly, it takes everything after the first word and uses that as a description. E.g. test
in the example below:
>3qv0_1 test
Would skipping the #=GF DE
output completely help if the description field is empty?
Thanks for the response, makes sense. I think that is a fine solution. Especially if hhblits is spitting out the query sequence without a description.
I think this should solve the issue.
:exclamation: Make to check out our User Guide.
Expected Behavior
The third line in the conversion file should read something like
=GF DE (description of multiple sequence alignment name)
Current Behavior
The output now is only
=GF DE
In addition, the following error prints upon execution: Use of uninitialized value $1 in printf at /hh-suite/scripts/reformat.pl line 749. This error indicates that a regex matching expression in perl found no match for the variable $1 that should occur just following the #=GF DE such as #=GF DE $1
Steps to Reproduce (for bugs)
Run hhblits with the following type of command: hhblits -d /hh-suite/databases/UniRef30_2020_02 -i /sequences/3qv0_1.fasta -ohhm /profiles/3qv0_1.hmm -oa3m /profiles/3qv0_1.a3m -hide_cons -hide_pred -hide_dssp -E 1E-06 -v 1 -cpu 1 Next I take the .a3m file output and run: /hh-suite/scripts/reformat.pl /profiles/3qv0_1.a3m /profiles/3qv0_1.sto
HH-suite Output (for bugs)
Please make sure to post the complete output of the tool you called. Please use gist.github.com. Here is the head of the .a3m file:
Here is the head of the reformatted .sto file (this output was made with -num as an option on reformat.pl, but it's the same without it):
STOCKHOLM 1.0
=GF DE
=GC RF ET-----QRVGDILQSE---LK-I------E----K-----------E------S------L--------------------------D-----------S-----------------------F-----N------D--------F----------L----------N--K--Y-K-F-S--LV---E----T----P---GK----NE------A---EIV-------RR---T--E---S------G--E--------------------T--VH----V---F--------------------------F-----------D-----------------------V------------------------A---------------------------------------------------------------------------------Q------------I-------------------------------------------------------------------------------A--------------------------------------------------------------------------------F--------------------------------------------------------------A------------------------------------------------------N-------------------------------V------------------------------------------N-------------------------V--------V---I--------S-----K---S------------------------------E-------P----A------------V-------------------S-----F------------------------------------E---------L----------------L-M---------------------N--------------------------L-------------------------------Q----------------------------------------------E-----------G--------S------------------------F---Y------------V---DS-----------A-T--P----------Y-P-------------S---V---D--A---------A----L--------------N--Q------S----A----------E-------------A----E---------------I----------------------------T---------R--------E----------L--V--------Y--H-----------G----P--------------------P--------F---------------------------------------------------------S------------------N-------------L-----D--E--EL--------Q-E--SL-EAY--------------L-----------------E-S-RG----------V-N---E-------ELASF------I-SA-----------YSE--F----K-----------E-----------N-----N-----EY------------IS--W-----------L---E--K---------MKK--FFH
3qv0_1 ET-----QRVGDILQSE---LK-I------E----K-----------E------S------L--------------------------D-----------S-----------------------F-----N------D--------F----------L----------N--K--Y-K-F-S--LV---E----T----P---GK----NE------A---EIV-------RR---T--E---S------G--E--------------------T--VH----V---F--------------------------F-----------D-----------------------V------------------------A---------------------------------------------------------------------------------Q------------I-------------------------------------------------------------------------------A--------------------------------------------------------------------------------F--------------------------------------------------------------A------------------------------------------------------N-------------------------------V------------------------------------------N-------------------------V--------V---I--------S-----K---S------------------------------E-------P----A------------V-------------------S-----F------------------------------------E---------L----------------L-M---------------------N--------------------------L-------------------------------Q----------------------------------------------E-----------G--------S------------------------F---Y------------V---DS-----------A-T--P----------Y-P-------------S---V---D--A---------A----L--------------N--Q------S----A----------E-------------A----E---------------I----------------------------T---------R--------E----------L--V--------Y--H-----------G----P--------------------P--------F---------------------------------------------------------S------------------N-------------L-----D--E--EL--------Q-E--SL-EAY--------------L-----------------E-S-RG----------V-N---E-------ELASF------I-SA-----------YSE--F----K-----------E-----------N-----N-----EY------------IS--W-----------L---E--K---------MKK--FFH UniRef100_A0A061B9H#2 -T-----SRLSETLKDE---LT-H------E----K-----------Q------N------Dtevp----------------------V-----------E-----------------------L-----N------S--------F----------I----------A--Q--S-G-F-E--VV---N----T----D---GQ----AL------A---KLQ-------KN---G------T------D--E--------------------V--VH----V---F--------------------------F-----------D-----------------------V------------------------N---------------------------------------------------------------------------------Q------------V-------------------------------------------------------------------------------Vnvrpaveeveveeeeefedpyen---------------------------------------------------------F--------------------------------------------------------------I------------------------------------------------------N-------------------------------L------------------------------------------N-------------------------V--------V---V--------E-----Kka-D------------------------------D-------S----A------------V-------------------A-----F------------------------------------D---------V----------------L-V---------------------G--------------------------P-------------------------------E----------------------------------------------D-----------G--------S------------------------T---Y------------I---EN-----------V-I--A----------Y-A-------------N---K---A--E---------A----L--------------T--E------T----A----------D-------------A----D---------------Q----------------------------K---------R--------E----------L--A--------Y--N-----------G----P--------------------A--------F---------------------------------------------------------S------------------N-------------L-----D--E--KL--------Q-E--NF-EQF--------------L-----------------T-S-RG----------I-N---E-------ELYQF------I-LN-----------YGI--H----K-----------E-----------N-----Q-----EY------------IA--W-----------L---E--K---------LNK--FFN UniRef100_A0A099P5X#3 -K-----TQLHEVITNE---LK-F------E----E-----------E------D------Sfgld----------------------E-----------T-----------------------F-----K------T--------Y----------L----------E--N--N-K-I-E--IV---N----T----D---GK----VL------A---ELV-------KK---F------N------N--E--------------------T--IH----I---Y--------------------------F-----------D-----------------------V------------------------L---------------------------------------------------------------------------------R------------I-------------------------------------------------------------------------------Tqtsyqlkqmqdqveqseylddelaeia-----------------------------------------------------N--------------------------------------------------------------A------------------------------------------------------D-------------------------------I------------------------------------------N-------------------------V--------V---I--------V-----K---D------------------------------S-------V----A------------T-------------------G-----F------------------------------------D---------L----------------S-L---------------------S--------------------------L-------------------------------V----------------------------------------------D-----------Q--------S------------------------F---S------------V---QA-----------I-T--N----------F-N-------------N---V---E--T---------A----L--------------S--D------S----P----------E-------------A----S---------------A----------------------------E---------R--------D----------L--K--------Y--S-----------G----P--------------------E--------Y---------------------------------------------------------S------------------N-------------L-----A--E--EL--------Q-E--AI-NQY--------------L-----------------M-S-RG----------I-N---N-------ELAEF------I-LA-----------YSG--V----K-----------E-----------N-----N-----EY------------LD--W-----------L---E--N---------LKK--FTA UniRef100_A0A0D6EJW#4 -P-----SALSTKLGEE---IK-F------E----T-----------E------N------Gdasaep--------------------D-----------F-----------------------L-----K------D--------F----------K----------A--D--G-V-W-K--LV---D----V----P---GS----DE------I---VLT-------RT---F------G------N--EkyvpsllppsrlsladqgdhS--IR----L---I--------------------------F-----------S-----------------------I------------------------S---------------------------------------------------------------------------------D------------L-------------------------------------------------------------------------------Daehdvepyvdeeaadagsggvgde--------------------------------------------------------S--------------------------------------------------------------V------------------------------------------------------Spseqafpve----------------------T------------------------------------------S-------------------------I--------T---I--------T-----Kp--S------------------------------G-------G----A------------L-------------------T-----I------------------------------------D---------A----------------V-A---------------------QgwsrpflalsswrspisrfvlltwltF-------------------------------L----------------------------------------------D-----------G--------L------------------------F---T------------I---NN-----------I-S--F----------Y-P-------------D---A---D--V---------A----L--------------G--M------T----S----------E-------------D----D---------------W----------------------------K---------R--------Q----------G--L--------Y--M-----------G----P--------------------A--------F---------------------------------------------------------D------------------N-------------L-----D--E--GV--------Q-S--EF-EQY--------------L-----------------E-E-RG----------I-N---S-------ALALF------I-PD-----------LAE--W----K-----------E-----------Q-----K-----EY------------VS--W-----------L---K--G---------TKE--FLE UniRef100_A0A0F8A2Q#5 -----------MMIEED---LK-A------N------------------------------Eqqp-----------------------A-----------S-----------------------I-----K------D--------F----------K----------D--N--S-P-Y-E--IH---D----T----P---GQ----EV------V---KLV-------RT---Y------N------D--E--------------------K--IT----V---S--------------------------F-----------S-----------------------I------------------------S---------------------------------------------------------------------------------D------------I-------------------------------------------------------------------------------Tnydpfnedpaleddempedamqnanqqrgvqstggarsaqtqeqmerdmeseegeeed----------------------M--------------------------------------------------------------D------------------------------------------------------Eapapis-------------------------L------------------------------------------S-------------------------I--------V---V--------E-----Kp--Gra----------------------------K-------G----A------------L-------------------N-----V------------------------------------E---------A----------------T-A---------------------Q---------------------------------------------------------------------------------------------------------D-----------G--------H------------------------I---V------------V---DN-----------V-Y--Y----------Y-D-------------A---A---V--A---------A----H--------------G--A------S----P----------E-------------G----L---------------E----------------------------K---------R--------A----------G--A--------Y--A-----------G----P--------------------P--------F---------------------------------------------------------G------------------S-------------L-----D--E--DL--------Q-V--LL-ERF--------------L-----------------E-E-RG----------I-D---Q-------SMAVF------V-PD-----------YVD--A----K-----------E-----------Q-----A-----EY------------TR--W-----------L---S--S---------VKG--FVD UniRef100_A0A0H5BZ4#6 -T-----SRVASTLKAE---LE-H------E----R-----------D------N------Apeaf----------------------N-----------E--------------------------------------------------------T----------S--F--A-G-F-S--VV---N----T----N---GQ----AL------G---KLE-------KD---S------S------D--E--------------------L--VH----V---F--------------------------F-----------D-----------------------V------------------------N---------------------------------------------------------------------------------Q------------I-------------------------------------------------------------------------------Vnlrsneaeeiegeeegfedpydsn--------------------------------------------------------F--------------------------------------------------------------I------------------------------------------------------N-------------------------------V------------------------------------------N-------------------------V--------V---V--------E-----Kks-D------------------------------G-------S----A------------V-------------------A-----F------------------------------------D---------V----------------L-V---------------------G--------------------------P-------------------------------E----------------------------------------------D-----------G--------S------------------------S---Y------------I---EN-----------V-T--A----------Y-A-------------D---K---T--E---------A----L--------------E--E------S----A----------E-------------A----E---------------Q----------------------------K---------R--------D----------L--R--------Y--N-----------G----P--------------------A--------F---------------------------------------------------------T------------------N-------------L-----D--E--KL--------Q-E--DF-ENY--------------L-----------------V-S-RG----------I-N---T-------DLFRF------I-VD-----------YGV--A----K-----------E-----------N-----N-----EY------------IS--W-----------L---N--K---------LNK--FFN
Context
The lack of a description breaks stockholm file processing using other tools (Biopython's AlignIO). It appears that the reformat.pl script is attempting to place a value here, but is unsuccessful. I have a feeling that the formatting of the .a3m file produced or the reformat.pl script needs to be modified to include the proper description.
Your Environment
Include as many relevant details about the environment you experienced the issue in.