soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
529 stars 133 forks source link

issue about building local databse #268

Open mujiezhang opened 3 years ago

mujiezhang commented 3 years ago

I build the databse from MSAs, first I place all of them in a single folder that does not contain any other files to create a single FFindex database and general two files: 227_msa.ffdata and 227_msa.ffindex, then I yse the command 'OMP_NUM_THREADS=1 mpirun -np 1 ffindex_apply_mpi 227_msa.ff{data,index} -i 227_a3m_wo_ss.ffindex -d 227_a3m_wo_ss.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0' and I got an error like this:

'mpirun was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command line parameter option (remember that mpirun interprets the first unrecognized command line token as the executable).

Node: localhost Executable: ffindex_apply_mpi'

So I wonder how to solve this problem

ksteczk commented 3 years ago

Do you have mpirun, ffindex_apply_mpi and hhconsensus in your PATH? Check output of these commands:

which mpirun
which ffindex_apply_mpi 
which hhconsensus
mujiezhang commented 3 years ago

I install the mpirun just now. and I do not find the ffindex_apply_mpi. I install hhsuite through conda, So this problem occur sometimes if hhusite installed through conda? How can I get ffindex_apply_mpi?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 19:44 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Do you have mpirun, ffindex_apply_mpi and hhconsensus in your PATH? Check output of these commands: which mpirun which ffindex_apply_mpi which hhconsensus — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ksteczk commented 3 years ago

You should use full path to ffindex_apply_mpi binary. I don't know where might that be in conda...

By the way - when you are using -np 1 option in mpirun consider skipping mpi at all and just go for: ffindex_apply 227_msa.ff{data,index} -i 227_a3m_wo_ss.ffindex -d 227_a3m_wo_ss.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0' it does the same

mujiezhang commented 3 years ago

Oh, thank you very much! You are so nice! The problem is solved. But I have another small question. I have several groups of proteins, and I want to find out whether a group is similar to another group. Now, I make local hhsuite database of these protein groups and do hhsearch using protein groups one by one against the database,.Am I right?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 19:55 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

You should use full path to ffindex_apply_mpi binary. I don't know where might that be in conda... By the way - when you are using -np 1 option in mpirun consider skipping mpi at all and just go for: ffindex_apply 227_msa.ff{data,index} -i 227_a3m_wo_ss.ffindex -d 227_a3m_wo_ss.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0' it does the same — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ksteczk commented 3 years ago

Suppose you have files: querydb_hhm.ffdata querydb_hhm.ffindex dbToBeSearched_hhm.ffdata dbToBeSearched_hhm.ffindex

You can run: ffindex_apply querydb_hhm.ffdata querydb_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d dbToBeSearched and this will generate 3rd database "mappings" with the results

mujiezhang commented 3 years ago

Sorry for my ignorance… I run the command ‘ffindex_apply 227_msa.ff{data,index} -i 227_a3m_wo_ss.ffindex -d 227_a3m_wo_ss.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0’ and it is right. Then I run ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 27_a3m.ffdata -- addss.pl -v 0 stdin stdout’ and it is right. But when I want to generate the hhm file using command ‘ffindex_apply 227_a3m.ff{data,index} -i 227_hhm.ffindex -d 227_hhm.ffdata -- hhmake -i stdin -o stdout -v 0’ ,I got lots of errors like ‘97.txt_muscle.msa 224 1 286 4

98.txt_muscle.msa 225 1 256 4

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 20:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Suppose you have files: querydb_hhm.ffdata querydb_hhm.ffindex dbToBeSearched_hhm.ffdata dbToBeSearched_hhm.ffindex You can run: ffindex_apply querydb_hhm.ffdata querydb_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d dbToBeSearched and this will generate 3rd database "mappings" with the results — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ksteczk commented 3 years ago

maybe something with 227_a3m.ff{data,index} files? You can see into the 227_a3m.ffdata file and check whether it contains anything. Another test is to run it without -v 0 option and see upon which db element it crashes.

mujiezhang commented 3 years ago

I have checked the 227_a3m.ffdata file, and it seems like a wrong file which contain ‘^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@’ But when I run the command ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 27_a3m.ffdata -- addss.pl -v 0 stdin stdout’, there is no wrong information…

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 20:48 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

maybe something with 227_a3m.ff{data,index} files? You can see into the 227_a3m.ffdata file and check whether it contains anything. Another test is to run it without -v 0 option and see upon which db element it crashes. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ksteczk commented 3 years ago

run addss.pl without -v 0

mujiezhang commented 3 years ago

 I run ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl stdin stdout’ and ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl’ and they generated the same results as ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl -v 0 stdin stdout’.  The formal space usage of a3m.ffdata file is usually larger than the msa.ffdata.But the 227_a3m.ffdata is only 227bytes. I do not know what wrong with it.

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:02 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

run addss.pl without -v 0 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ksteczk commented 3 years ago

did you setup the paths in addss.pl script? it requires paths to psipred as far as I recall...

pon., 24 maj 2021 o 15:11 mujiezhang @.***> napisał(a):

I run ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl stdin stdout’ and ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl’ and they generated the same results as ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl -v 0 stdin stdout’. The formal space usage of a3m.ffdata file is usually larger than the msa.ffdata.But the 227_a3m.ffdata is only 227bytes. I do not know what wrong with it.

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:02 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

run addss.pl without -v 0 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847032702, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI2WHU63XM2Z2YPAZ6TTPJGAVANCNFSM45NBQOKQ .

mujiezhang commented 3 years ago

Maybe I can try to install the hhsuite through source. Anyway, thanks a lot and you are so patient with me. Thanks again!

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:22 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

did you setup the paths in addss.pl script? it requires paths to psipred as far as I recall...

pon., 24 maj 2021 o 15:11 mujiezhang @.***> napisał(a):

I run ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl stdin stdout’ and ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl’ and they generated the same results as ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl -v 0 stdin stdout’. The formal space usage of a3m.ffdata file is usually larger than the msa.ffdata.But the 227_a3m.ffdata is only 227bytes. I do not know what wrong with it.

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:02 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

run addss.pl without -v 0 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847032702, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI2WHU63XM2Z2YPAZ6TTPJGAVANCNFSM45NBQOKQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ksteczk commented 3 years ago

It won't solve your problem - you have to configure psipred anyway - hhsuite uses that and it is an external tool to be connected to hhsuite.

mujiezhang commented 3 years ago

Oh! But I do not know how to configure psipred. Should I download it throuh conda ? 发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:31 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

It won't solve your problem - you have to configure psipred anyway - hhsuite uses that and it is an external tool to be connected to hhsuite. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ksteczk commented 3 years ago

First of all, you can easily skip ss prediction and go with non ss a3m. According to hhsuite documentation sensitivity increase is little unless you're going to play with parameters more deeply.

If you want to go for ss prediction anyway, you should install psipred or compile it from source, and edit HHPaths.pm in hhsuite scripts subdirectory to work with your local psipred installation.

mujiezhang commented 3 years ago

Thank you very much! Your advices are very useful! I am trying.

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:56 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

First of all, you can easily skip ss prediction and go with non ss a3m. According to hhsuite documentation sensitivity increase is little unless you're going to play with parameters more deeply. If you want to go for ss prediction anyway, you should install psipred or compile it from source, and edit HHPaths.pm in hhsuite scripts subdirectory to work with your local psipred installation. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang commented 3 years ago

Another stupid question…how to skip ss prediction……

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:56 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

First of all, you can easily skip ss prediction and go with non ss a3m. According to hhsuite documentation sensitivity increase is little unless you're going to play with parameters more deeply. If you want to go for ss prediction anyway, you should install psipred or compile it from source, and edit HHPaths.pm in hhsuite scripts subdirectory to work with your local psipred installation. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ksteczk commented 3 years ago

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them.

mujiezhang commented 3 years ago
Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like

‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ksteczk commented 3 years ago

Assuming you are running hhsearch using ffindex_apply you get hhsearch mappings between all profiles (hmm/a3m) in your query database to profiles in the target database. You pasted a fragment of the result showing how MH719189.1_prot_AYD80303.1_44 compares to entries in the target database. As you see it is similar to itself and barely similar to the remaining objects in the target database. HHsearch reports minimum 10 hits even if they don't meet reliability thresholds criteria (and more hits if it finds more similar objects in the database).

What exactly do you want to do? Compare each sequence with each? You can assume that the remaining 217 sequences in the target database are not similar to the query.

wt., 25 maj 2021 o 04:13 mujiezhang @.***> napisał(a):

Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like ‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847477997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI4JGRRGUOAYJQPBKHDTPMBU5ANCNFSM45NBQOKQ .

mujiezhang commented 3 years ago

I have 227 clusters of proteins. What I exactly want to do is to ensure which protein cluster are similar to another.   What I have done are that I made alignment of every protein clusters and used them to make the hhsearch database as you told me before and the documents online. Then I want to compare the 227 clusters to themselves and I run the command ‘ffindex_apply 227_hhm.ffdata 227_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d 227’

  And I got the result file-mappings.ffdata which contains the hhsearch results. But as you can seen in the mappings.ffdata, I just could not understand the result clearly. Does the query represent the cluster it belongs to? For example, if the query sequence A belongs to cluster1, it has a very good hit of squences B belongs to cluster2, So can I say that the cluster1 are similar to cluster 2?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:30 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Assuming you are running hhsearch using ffindex_apply you get hhsearch mappings between all profiles (hmm/a3m) in your query database to profiles in the target database. You pasted a fragment of the result showing how MH719189.1_prot_AYD80303.1_44 compares to entries in the target database. As you see it is similar to itself and barely similar to the remaining objects in the target database. HHsearch reports minimum 10 hits even if they don't meet reliability thresholds criteria (and more hits if it finds more similar objects in the database).

What exactly do you want to do? Compare each sequence with each? You can assume that the remaining 217 sequences in the target database are not similar to the query.

wt., 25 maj 2021 o 04:13 mujiezhang @.***> napisał(a):

Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like ‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847477997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI4JGRRGUOAYJQPBKHDTPMBU5ANCNFSM45NBQOKQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ksteczk commented 3 years ago

So your interpretation is that MH719189.1_prot_AYD80303.1_44 doesn't cluster with any other msa in the database.

śr., 26 maj 2021 o 09:52 mujiezhang @.***> napisał(a):

I have 227 clusters of proteins. What I exactly want to do is to ensure which protein cluster are similar to another. What I have done are that I made alignment of every protein clusters and used them to make the hhsearch database as you told me before and the documents online. Then I want to compare the 227 clusters to themselves and I run the command ‘ffindex_apply 227_hhm.ffdata 227_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d 227’

And I got the result file-mappings.ffdata which contains the hhsearch results. But as you can seen in the mappings.ffdata, I just could not understand the result clearly. Does the query represent the cluster it belongs to? For example, if the query sequence A belongs to cluster1, it has a very good hit of squences B belongs to cluster2, So can I say that the cluster1 are similar to cluster 2?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:30 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Assuming you are running hhsearch using ffindex_apply you get hhsearch mappings between all profiles (hmm/a3m) in your query database to profiles in the target database. You pasted a fragment of the result showing how MH719189.1_prot_AYD80303.1_44 compares to entries in the target database. As you see it is similar to itself and barely similar to the remaining objects in the target database. HHsearch reports minimum 10 hits even if they don't meet reliability thresholds criteria (and more hits if it finds more similar objects in the database).

What exactly do you want to do? Compare each sequence with each? You can assume that the remaining 217 sequences in the target database are not similar to the query.

wt., 25 maj 2021 o 04:13 mujiezhang @.***> napisał(a):

Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like ‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847477997>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AD2CMI4JGRRGUOAYJQPBKHDTPMBU5ANCNFSM45NBQOKQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-848550908, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI4JJ4K7M7DTEX5G4X3TPSSCXANCNFSM45NBQOKQ .

mujiezhang commented 3 years ago

Maybe I should show another picture to you. Now as you can see in the picture, The protein lcl | NC_019455.1_prot_YP_007002910.1_2 belonging to protein cluster A have two significant hit with prob>90, one is lcl | NC_018274.1_prot_YP_006560 belonging to protein cluster B and another is lcl | NC_005882.1_prot_YP_024689 belonging to protein cluster C. So I certainly know the lcl | NC_019455.1_prot_YP_007002910.1_2 is similar to the two hit. But what I am not sure is that whether cluster A are similar to cluster B and C. Can the query sequence represent the cluster it belongs to? 发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:54 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

So your interpretation is that MH719189.1_prot_AYD80303.1_44 doesn't cluster with any other msa in the database.

śr., 26 maj 2021 o 09:52 mujiezhang @.***> napisał(a):

I have 227 clusters of proteins. What I exactly want to do is to ensure which protein cluster are similar to another. What I have done are that I made alignment of every protein clusters and used them to make the hhsearch database as you told me before and the documents online. Then I want to compare the 227 clusters to themselves and I run the command ‘ffindex_apply 227_hhm.ffdata 227_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d 227’

And I got the result file-mappings.ffdata which contains the hhsearch results. But as you can seen in the mappings.ffdata, I just could not understand the result clearly. Does the query represent the cluster it belongs to? For example, if the query sequence A belongs to cluster1, it has a very good hit of squences B belongs to cluster2, So can I say that the cluster1 are similar to cluster 2?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:30 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Assuming you are running hhsearch using ffindex_apply you get hhsearch mappings between all profiles (hmm/a3m) in your query database to profiles in the target database. You pasted a fragment of the result showing how MH719189.1_prot_AYD80303.1_44 compares to entries in the target database. As you see it is similar to itself and barely similar to the remaining objects in the target database. HHsearch reports minimum 10 hits even if they don't meet reliability thresholds criteria (and more hits if it finds more similar objects in the database).

What exactly do you want to do? Compare each sequence with each? You can assume that the remaining 217 sequences in the target database are not similar to the query.

wt., 25 maj 2021 o 04:13 mujiezhang @.***> napisał(a):

Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like ‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847477997>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AD2CMI4JGRRGUOAYJQPBKHDTPMBU5ANCNFSM45NBQOKQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-848550908, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI4JJ4K7M7DTEX5G4X3TPSSCXANCNFSM45NBQOKQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ksteczk commented 3 years ago

Didn't get any picture. Anyway - we can switch to regular e-mails with the discussion since the hhsuite problem was solved. Feel free to catch me on kamil dot steczkiewicz at gmail.com.

śr., 26 maj 2021 o 10:06 mujiezhang @.***> napisał(a):

Maybe I should show another picture to you. Now as you can see in the picture, The protein lcl | NC_019455.1_prot_YP_007002910.1_2 belonging to protein cluster A have two significant hit with prob>90, one is lcl | NC_018274.1_prot_YP_006560 belonging to protein cluster B and another is lcl | NC_005882.1_prot_YP_024689 belonging to protein cluster C. So I certainly know the lcl | NC_019455.1_prot_YP_007002910.1_2 is similar to the two hit. But what I am not sure is that whether cluster A are similar to cluster B and C. Can the query sequence represent the cluster it belongs to? 发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:54 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

So your interpretation is that MH719189.1_prot_AYD80303.1_44 doesn't cluster with any other msa in the database.

śr., 26 maj 2021 o 09:52 mujiezhang @.***> napisał(a):

I have 227 clusters of proteins. What I exactly want to do is to ensure which protein cluster are similar to another. What I have done are that I made alignment of every protein clusters and used them to make the hhsearch database as you told me before and the documents online. Then I want to compare the 227 clusters to themselves and I run the command ‘ffindex_apply 227_hhm.ffdata 227_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d 227’

And I got the result file-mappings.ffdata which contains the hhsearch results. But as you can seen in the mappings.ffdata, I just could not understand the result clearly. Does the query represent the cluster it belongs to? For example, if the query sequence A belongs to cluster1, it has a very good hit of squences B belongs to cluster2, So can I say that the cluster1 are similar to cluster 2?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:30 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Assuming you are running hhsearch using ffindex_apply you get hhsearch mappings between all profiles (hmm/a3m) in your query database to profiles in the target database. You pasted a fragment of the result showing how MH719189.1_prot_AYD80303.1_44 compares to entries in the target database. As you see it is similar to itself and barely similar to the remaining objects in the target database. HHsearch reports minimum 10 hits even if they don't meet reliability thresholds criteria (and more hits if it finds more similar objects in the database).

What exactly do you want to do? Compare each sequence with each? You can assume that the remaining 217 sequences in the target database are not similar to the query.

wt., 25 maj 2021 o 04:13 mujiezhang @.***> napisał(a):

Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like ‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847477997 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AD2CMI4JGRRGUOAYJQPBKHDTPMBU5ANCNFSM45NBQOKQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/soedinglab/hh-suite/issues/268#issuecomment-848550908>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AD2CMI4JJ4K7M7DTEX5G4X3TPSSCXANCNFSM45NBQOKQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-848559829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMIYKRQ6ZKVIQ52SEVC3TPSTXRANCNFSM45NBQOKQ .

ksteczk commented 2 years ago

Seems that the file is missing? Is it in the directory from which you're running the script? Are you running it locally on the same machine? Why there's error from mpirun? How exactly did you run this?

śr., 6 kwi 2022, 11:22 użytkownik chao @.***> napisał:

When I enter the following command: 'ffindex_apply cluster1091_a3m_wo_ss.ff{data,index} -i cluster1091_a3m.ffindex -d cluster1091_a3m.ffdata -- addss.pl stdin stdout /big/martin/hh-suite/lib/ffindex/src/ffindex_apply_mpi.c:341 ffindex_apply: cluster1091_a3m_wo_ss.ffdata: No such file or directory' there is such an error, how should I solve it, thank you

— Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-1090049770, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMIY6YOMDQY3C6LCGY6LVDVJVHANCNFSM45NBQOKQ . You are receiving this because you commented.Message ID: @.***>

lonestarling commented 2 years ago

Seems that the file is missing? Is it in the directory from which you're running the script? Are you running it locally on the same machine? Why there's error from mpirun? How exactly did you run this? śr., 6 kwi 2022, 11:22 użytkownik chao @.> napisał: When I enter the following command: 'ffindex_apply cluster1091_a3m_wo_ss.ff{data,index} -i cluster1091_a3m.ffindex -d cluster1091_a3m.ffdata -- addss.pl stdin stdout /big/martin/hh-suite/lib/ffindex/src/ffindex_apply_mpi.c:341 ffindex_apply: cluster1091_a3m_wo_ss.ffdata: No such file or directory' there is such an error, how should I solve it, thank you — Reply to this email directly, view it on GitHub <#268 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMIY6YOMDQY3C6LCGY6LVDVJVHANCNFSM45NBQOKQ . You are receiving this because you commented.Message ID: @.>

Hi ksteczk when i run the hhsearch i meet the new issue "could not open file 'msa/HHM/Allterl_hmm_cs219.ffdata', In /big/martin/hh-suite/src/ffindexdatabase.cpp:11: FFindexDatabase:" firstly, i build the db from the all hmm file by ffiindex_build and i get the Allterl_hmm.ffdata and Allterl_hmm.ffindex file. then i query the single hmm. file to the the allter_hmm.ffindex file by hhsearch. but i meet this issue. so can you figure it out? guys. appreciated it ! yours

shikingstar commented 1 year ago

Seems that the file is missing? Is it in the directory from which you're running the script? Are you running it locally on the same machine? Why there's error from mpirun? How exactly did you run this? śr., 6 kwi 2022, 11:22 użytkownik chao @._> napisał: When I enter the following command: 'ffindex_apply cluster1091_a3m_wo_ss.ff{data,index} -i cluster1091_a3m.ffindex -d cluster1091_a3m.ffdata -- addss.pl stdin stdout /big/martin/hh-suite/lib/ffindex/src/ffindex_apply_mpi.c:341 ffindex_apply: cluster1091_a3m_woss.ffdata: No such file or directory' there is such an error, how should I solve it, thank you — Reply to this email directly, view it on GitHub <#268 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMIY6YOMDQY3C6LCGY6LVDVJVHANCNFSM45NBQOKQ . You are receiving this because you commented.Message ID: @_._>

Hi ksteczk when i run the hhsearch i meet the new issue "could not open file 'msa/HHM/Allterl_hmm_cs219.ffdata', In /big/martin/hh-suite/src/ffindexdatabase.cpp:11: FFindexDatabase:" firstly, i build the db from the all hmm file by ffiindex_build and i get the Allterl_hmm.ffdata and Allterl_hmm.ffindex file. then i query the single hmm. file to the the allter_hmm.ffindex file by hhsearch. but i meet this issue. so can you figure it out? guys. appreciated it ! yours

I also encountered this problem, did you solve it?