qiyunlab / HGTector

HGTector2: Genome-wide prediction of horizontal gene transfer based on distribution of sequence homology patterns.
BSD 3-Clause "New" or "Revised" License
127 stars 35 forks source link

stuck at Step 1 #9

Closed ceya closed 3 years ago

ceya commented 8 years ago

Hello,

I'm trying to use HGTector on a terminal and wasn't able to run the sample folder. Here's the error message I got:

Step 1: Searcher - batch protein sequence homology search. This Perl not built to support threads Compilation failed in require at /home/brooke/HGTector-0.2.1/scripts/searcher.pl line 5. BEGIN failed--compilation aborted at /home/brooke/HGTector-0.2.1/scripts/searcher.pl line 5. Error: Execution of searcher.pl failed. HGTector exists.

Is there anyway I can change my config files to make to work? I tried other thread options but they didn't help. Thanks! Brooke

qiyunzhu commented 8 years ago

Hello Brooke,

It seems that your Perl does not support threads. I guess that there is a way to bypass this issue. If you can open /scripts/searcher.pl in a text editor, you will find lines 5 and 6, which read:

use threads;
use threads::shared;

You may change them into:

eval{ require threads; threads->import() };
eval{ require threads::shared; threads::shared->import() };

Save the changes and re-run the program. You will not be able to run http Blast, but it is not recommended anyway. You will need to download the database and run local Blast.

The above fix is but my guess. Please let me know if it works or not. Thanks!

Best, Qiyun

ceya commented 8 years ago

Hi Qiyun,

Here's the error after changing the search:

Bareword "threads::running" not allowed while "strict subs" in use at /home/brooke/HGTector-0.2.1/scripts/searcher.pl line 779. Bareword "threads::joinable" not allowed while "strict subs" in use at /home/brooke/HGTector-0.2.1/scripts/searcher.pl line 812. Execution of /home/brooke/HGTector-0.2.1/scripts/searcher.pl aborted due to compilation errors. Error: Execution of searcher.pl failed. HGTector exists.

Any suggestions? Thanks a lot! Brooke

On Tue, Sep 20, 2016 at 11:58 AM, Qiyun Zhu notifications@github.com wrote:

Hello Brooke,

It seems that your Perl does not support threads. I guess that there is a way to bypass this issue. If you can open /scripts/searcher.pl in a text editor, you will find lines 5 and 6, which read:

use threads; use threads::shared;

You may change them into:

eval{ require threads; threads->import() }; eval{ require threads::shared; threads::shared->import() };

Save the changes and re-run the program. You will not be able to run http Blast, but it is not recommended anyway. You will need to download the database and run local Blast.

The above fix is but my guess. Please let me know if it works or not. Thanks!

Best, Qiyun

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DittmarLab/HGTector/issues/9#issuecomment-248381097, or mute the thread https://github.com/notifications/unsubscribe-auth/AM5Zux4oR5tkD5Hn5mkkye2lJAkbkNSXks5qsB7ggaJpZM4KBz6b .

qiyunzhu commented 8 years ago

Hi Brooke,

This new error occurs when the program attempts to allocate multiple threads to run http Blast. Now you cannot run http Blast any more. That is, you need to change httpBlast=1 to =0 in config.txt, or simply remove this line.

If you are running the example provided with the program package, well, it by default calls http Blast. If that is the case, you will need to read GUI.html and make your own configuration.

Best, Qiyun

ceya commented 8 years ago

Hi Qiyun,

I tried using the databaser.py to build the database but ran into the following error:

Reading the NCBI taxonomy database... done. 1519526 taxa read. Downloading the NCBI representative genome list... done. Reading the NCBI representative genome list... done. 4422 genomes read. Reading RefSeq genome list... done. 67048 genomes read. Subsampling genomes... done. 10945 genomes retained. Reading RefSeq genomic data... Traceback (most recent call last): File "scripts/databaser.py", line 262, in try: files = ftp.nlst() File "/usr/lib64/python2.6/ftplib.py", line 506, in nlst self.retrlines(cmd, files.append) File "/usr/lib64/python2.6/ftplib.py", line 442, in retrlines return self.voidresp() File "/usr/lib64/python2.6/ftplib.py", line 228, in voidresp resp = self.getresp() File "/usr/lib64/python2.6/ftplib.py", line 214, in getresp resp = self.getmultiline() File "/usr/lib64/python2.6/ftplib.py", line 200, in getmultiline line = self.getline() File "/usr/lib64/python2.6/ftplib.py", line 190, in getline if not line: raise EOFError EOFError

Does this have to do with the python I have? Thanks! Brooke

qiyunzhu commented 8 years ago

Hi Brooke guess what, I just tried my program and got the same error. I found that a particular directory on the NCBI FTP server is not accessible. It is: http://ftp.ncbi.nlm.nih.gov/genomes/all/. I don't know if it is temporary or not. Hope it gets back at some time...

ceya commented 8 years ago

Hi Qiyun,

The directory you mentioned is good now. Do you mind checking if databaser.py works now? I'm still getting the same error as before.

Thank you very much! Brooke

On Tue, Sep 20, 2016 at 5:27 PM, Qiyun Zhu notifications@github.com wrote:

Hi Brooke guess what, I just tried my program and git the same error. I found that a particular directory on the NCBI FTP server is not accessible. It is: http://ftp.ncbi.nlm.nih.gov/genomes/all/. I don't know if it is temporary or not. Hope it gets back at some time...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DittmarLab/HGTector/issues/9#issuecomment-248466332, or mute the thread https://github.com/notifications/unsubscribe-auth/AM5Zu2aW6LDdcwFZDquRJ3phhcUG2nMhks5qsGvggaJpZM4KBz6b .

qiyunzhu commented 8 years ago

Hello Brooke, I still couldn't access this directory...

ceya commented 8 years ago

Hi Qiyun,

I manage to find a ncbi nr database on a group server, but HGTector doesn't seem to be able to find taxaID automatically: BLAST Database error: No alias or index file found for nucleotide database [/usr/local/blastdb] in search path [/home/brooke/HGTector-0.2.1::]

does this mean I need to download the dictionary as well?

Thanks! Brooke

On Wed, Sep 21, 2016 at 2:44 PM, Qiyun Zhu notifications@github.com wrote:

Hello Brooke, I still couldn't access this directory...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DittmarLab/HGTector/issues/9#issuecomment-248737082, or mute the thread https://github.com/notifications/unsubscribe-auth/AM5Zu9LQINyiN_WdBEfGIBTFxcrznEe6ks5qsZcUgaJpZM4KBz6b .

qiyunzhu commented 8 years ago

Hi Brooke,

I think that means the database path is not correctly set. It is probably /usr/local/blastdb/nr.

Best, Qiyun

On Wed, Sep 21, 2016 at 3:19 PM, ceya notifications@github.com wrote:

Hi Qiyun,

I manage to find a ncbi nr database on a group server, but HGTector doesn't seem to be able to find taxaID automatically: BLAST Database error: No alias or index file found for nucleotide database [/usr/local/blastdb] in search path [/home/brooke/HGTector-0.2.1::]

does this mean I need to download the dictionary as well?

Thanks! Brooke

On Wed, Sep 21, 2016 at 2:44 PM, Qiyun Zhu notifications@github.com wrote:

Hello Brooke, I still couldn't access this directory...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/DittmarLab/HGTector/issues/9#issuecomment-248737082 , or mute the thread https://github.com/notifications/unsubscribe-auth/AM5Zu9LQINyiN_ WdBEfGIBTFxcrznEe6ks5qsZcUgaJpZM4KBz6b .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DittmarLab/HGTector/issues/9#issuecomment-248759727, or mute the thread https://github.com/notifications/unsubscribe-auth/AEMVN4cpn9D7I6-KA1_-AJKCLGKWGMOKks5qsa1zgaJpZM4KBz6b .

ceya commented 8 years ago

Hi Qiyun,

should the path be to the directory containing all the nr files? on our server all the databases are put in the same directory /local/blastdb. In this case can I write path to specific files instead?

Thanks!

On Wed, Sep 21, 2016 at 4:21 PM, Qiyun Zhu notifications@github.com wrote:

Hi Brooke,

I think that means the database path is not correctly set. It is probably /usr/local/blastdb/nr.

Best, Qiyun

On Wed, Sep 21, 2016 at 3:19 PM, ceya notifications@github.com wrote:

Hi Qiyun,

I manage to find a ncbi nr database on a group server, but HGTector doesn't seem to be able to find taxaID automatically: BLAST Database error: No alias or index file found for nucleotide database [/usr/local/blastdb] in search path [/home/brooke/HGTector-0.2.1::]

does this mean I need to download the dictionary as well?

Thanks! Brooke

On Wed, Sep 21, 2016 at 2:44 PM, Qiyun Zhu notifications@github.com wrote:

Hello Brooke, I still couldn't access this directory...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/DittmarLab/HGTector/issues/9# issuecomment-248737082 , or mute the thread https://github.com/notifications/unsubscribe-auth/AM5Zu9LQINyiN_ WdBEfGIBTFxcrznEe6ks5qsZcUgaJpZM4KBz6b .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/DittmarLab/HGTector/issues/9#issuecomment-248759727 , or mute the thread https://github.com/notifications/unsubscribe-auth/AEMVN4cpn9D7I6-KA1_- AJKCLGKWGMOKks5qsa1zgaJpZM4KBz6b

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DittmarLab/HGTector/issues/9#issuecomment-248760121, or mute the thread https://github.com/notifications/unsubscribe-auth/AM5Zu9lDG7RuIz7onHIgqPTU5oF93VnZks5qsa3vgaJpZM4KBz6b .

qiyunzhu commented 8 years ago

The path should be the directory containing the database plus the stem file name of the database. For example, you have files like nr.00.pir under /local/blastdb, then the path should be /local/blastdb/nr. - Qiyun

On Wed, Sep 21, 2016 at 3:39 PM, ceya notifications@github.com wrote:

Hi Qiyun,

should the path be to the directory containing all the nr files? on our server all the databases are put in the same directory /local/blastdb. In this case can I write path to specific files instead?

Thanks!

On Wed, Sep 21, 2016 at 4:21 PM, Qiyun Zhu notifications@github.com wrote:

Hi Brooke,

I think that means the database path is not correctly set. It is probably /usr/local/blastdb/nr.

Best, Qiyun

On Wed, Sep 21, 2016 at 3:19 PM, ceya notifications@github.com wrote:

Hi Qiyun,

I manage to find a ncbi nr database on a group server, but HGTector doesn't seem to be able to find taxaID automatically: BLAST Database error: No alias or index file found for nucleotide database [/usr/local/blastdb] in search path [/home/brooke/HGTector-0.2.1::]

does this mean I need to download the dictionary as well?

Thanks! Brooke

On Wed, Sep 21, 2016 at 2:44 PM, Qiyun Zhu notifications@github.com wrote:

Hello Brooke, I still couldn't access this directory...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/DittmarLab/HGTector/issues/9# issuecomment-248737082 , or mute the thread https://github.com/notifications/unsubscribe-auth/AM5Zu9LQINyiN_ WdBEfGIBTFxcrznEe6ks5qsZcUgaJpZM4KBz6b .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/DittmarLab/HGTector/issues/9# issuecomment-248759727 , or mute the thread https://github.com/notifications/unsubscribe- auth/AEMVN4cpn9D7I6-KA1_- AJKCLGKWGMOKks5qsa1zgaJpZM4KBz6b

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/DittmarLab/HGTector/issues/9#issuecomment-248760121 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AM5Zu9lDG7RuIz7onHIgqPTU5oF93VnZks5qsa3vgaJpZM4KBz6b .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DittmarLab/HGTector/issues/9#issuecomment-248763295, or mute the thread https://github.com/notifications/unsubscribe-auth/AEMVNzdNK6t2W5Gv03EVwZmxI11DG3o-ks5qsbISgaJpZM4KBz6b .

ceya commented 8 years ago

Hi Qiyun,

I'm at analyzer step but the program doesn't recognize the taxID I have for close group, could you take at the following message and let me know what might have happened? Thanks!

-> Analyzer: Identify putative HGT-derived genes based on search results. <- Reading taxonomic information... done. Analyzing taxonomic information... done. All input genomes belong to species Lactobacillus johnsonii (TaxID: 33959). Analysis will work on the following taxonomic ranks: Self: (user-defined self) Lactobacillus johnsonii (TaxID: 33959) (1 members), Close: (user-defined close) unknown (TaxID: 1598) (0 members),

On Wed, Sep 21, 2016 at 4:41 PM, Qiyun Zhu notifications@github.com wrote:

The path should be the directory containing the database plus the stem file name of the database. For example, you have files like nr.00.pir under /local/blastdb, then the path should be /local/blastdb/nr. - Qiyun

On Wed, Sep 21, 2016 at 3:39 PM, ceya notifications@github.com wrote:

Hi Qiyun,

should the path be to the directory containing all the nr files? on our server all the databases are put in the same directory /local/blastdb. In this case can I write path to specific files instead?

Thanks!

On Wed, Sep 21, 2016 at 4:21 PM, Qiyun Zhu notifications@github.com wrote:

Hi Brooke,

I think that means the database path is not correctly set. It is probably /usr/local/blastdb/nr.

Best, Qiyun

On Wed, Sep 21, 2016 at 3:19 PM, ceya notifications@github.com wrote:

Hi Qiyun,

I manage to find a ncbi nr database on a group server, but HGTector doesn't seem to be able to find taxaID automatically: BLAST Database error: No alias or index file found for nucleotide database [/usr/local/blastdb] in search path [/home/brooke/HGTector-0.2.1::]

does this mean I need to download the dictionary as well?

Thanks! Brooke

On Wed, Sep 21, 2016 at 2:44 PM, Qiyun Zhu <notifications@github.com

wrote:

Hello Brooke, I still couldn't access this directory...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/DittmarLab/HGTector/issues/9# issuecomment-248737082 , or mute the thread https://github.com/notifications/unsubscribe-auth/AM5Zu9LQINyiN_ WdBEfGIBTFxcrznEe6ks5qsZcUgaJpZM4KBz6b .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/DittmarLab/HGTector/issues/9# issuecomment-248759727 , or mute the thread https://github.com/notifications/unsubscribe- auth/AEMVN4cpn9D7I6-KA1_- AJKCLGKWGMOKks5qsa1zgaJpZM4KBz6b

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/DittmarLab/HGTector/issues/9# issuecomment-248760121 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AM5Zu9lDG7RuIz7onHIgqPTU5oF93VnZks5qsa3vgaJpZM4KBz6b .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/DittmarLab/HGTector/issues/9#issuecomment-248763295 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AEMVNzdNK6t2W5Gv03EVwZmxI11DG3o-ks5qsbISgaJpZM4KBz6b

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DittmarLab/HGTector/issues/9#issuecomment-248763568, or mute the thread https://github.com/notifications/unsubscribe-auth/AM5Zu62DQl48T8tf6ybgKfwDtrgA2ZhEks5qsbJ8gaJpZM4KBz6b .

qiyunzhu commented 8 years ago

Hi Brooke,

That's probably because the "close" group is not configured correctly in config.txt. Can you post this file's content?

Best, Qiyun

ceya commented 8 years ago

Hi Qiyun,

Here's my config setting:

inSets=103a selfTax=103a:33959 protdb=/home/brooke/HGTector-0.2.1/JRfaa taxdump=/home/brooke/HGTector-0.2.1/taxdump prot2taxid=/home/brooke/HGTector-0.2.1/JR_taxID.txt

Search tool

searchTool=BLAST # Use NCBI-BLAST+ for search blastp=blastp blastdbcmd=blastdbcmd

BLAST parameters

threads=0 # Use all CPU cores for BLAST queries=0 # Query all sequences per BLAST run nHits=1000 # return 1000 hits per query sequence maxHits=100 # retain up to 100 non-redundant hits evalue=1e-20 # E-value cutoff identity=30 # percent identity cutoff coverage=50 # percent coverage cutoff getAln=1 # retrieve aligned part for hits

Grouping scenario

selfGroup=33959 # johnsonii distalGroup=1598 # reuteri

If I'm using my own database and the protein entries look like this

strain1|0001 name1 strain1|0002 name2

should my proteinID to tax file look like strain1|0001 name1 [tab]33959 strain1|0002 name2 [tab]33959?

And if I'm looking for HGT between two species in the same genus, L. johnsonii (33959) and L. reuteri (1598), should the group setting be: selfGroup:33959 distalGroup:1598?

Thank you very much! Brooke

On Sun, Sep 25, 2016 at 2:12 AM, Qiyun Zhu notifications@github.com wrote:

Hi Brooke,

That's probably because the "close" group is not configured correctly in config.txt. Can you post this file's content?

Best, Qiyun

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DittmarLab/HGTector/issues/9#issuecomment-249409041, or mute the thread https://github.com/notifications/unsubscribe-auth/AM5Zu9DndhTEen7BvFItc8kY9BelHXpuks5qti0JgaJpZM4KBz6b .

qiyunzhu commented 8 years ago

Hello Brook @ceya I am truly sorry for not responding for such a looooong time. I was trapped by many things. Here I revisited your question. Maybe it's already too late so I am sorry...

The problem is that you didn't specify the "close" group. Instead, you specified the "distal" group, which you don't have to, because the "distal" is by default everything other than "self" and "close".

In your case, I think you can do:

Step 1: Within genus Lactobacillus, who are the close relatives to L. johnsonii (33959)? I did some brief search. According to Claesson et al. (2008), they are: L. gasseri (1596), L. helveticus (1587), L. acidophilus (1579) and L. bulgaricus (1585), while your anticipated gene donor, L. reuteri (1598), is in another lineage. Therefore, when you BLAST a vertically inherited L. johnsonii gene, you expect to see the best hits from the four genomes, instead of L. reuteri or others.

Step 2: Thus, you can define selfGroup = 33959, and closeGroup = 1596,1587,1579,1585, and no distalGroup (it isn't a parameter). There may be other Lactobacillus genomes in your database that are close to L. johnsonii, you will need to add them to closeGroup too.

Step 3: Your proteinID to tax file could be like: strain1|0001[tab]33959 Note that everything after the first white space was chopped. So you don't need name1.