qiyunlab / HGTector

HGTector2: Genome-wide prediction of horizontal gene transfer based on distribution of sequence homology patterns.
BSD 3-Clause "New" or "Revised" License
116 stars 34 forks source link

taxid #137

Open duxi190914 opened 3 weeks ago

duxi190914 commented 3 weeks ago

Homology search started at 2024-08-21 08:40:06.163220. Settings: Search method: diamond. Self-alignment method: native. Remote fetch enabled: no. Reading input proteins... Sulfurospirillum_multivorans: 3224 proteins. Done. Read 3224 proteins from 1 samples. Dropping sequences shorter than 30 aa... done. Reading local taxonomy database... done. Read 2597948 taxa. Batch homology search of Sulfurospirillum_multivorans started at 2024-08-21 08:40:16.227200. Number of queries: 3223. WARNING: Cannot obtain taxIds for 470074 sequences. These hits will be dropped. There are so many taxIds that have not been obtained, will it affect the results

qiyunzhu commented 3 weeks ago

It will. Could there be some issues with the database? If it was automatically generated from NCBI, all sequences should have taxIDs.

duxi190914 commented 2 weeks ago

Dear professor, I downloaded the database from NCBI before, but many sequences couldn't get taxIDs. So I re-downloaded the database you provided, but there are still some new errors, and the error code is in word. I have another question. The database downloaded by ncbi has 350G, but the one you provided is only 50G, could the database generated by you be used as the final result?

wish you all the best duxi ---- Replied Message ---- | From | Qiyun @.> | | Date | 8/22/2024 03:56 | | To | @.> | | Cc | @.> , @.> | | Subject | Re: [qiyunlab/HGTector] taxid (Issue #137) |

It will. Could there be some issues with the database? If it was automatically generated from NCBI, all sequences should have taxIDs.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

qiyunzhu commented 2 weeks ago

Hi @duxi190914 It could be that NCBI has updated its database structure such that the existing HGTector pulls more than expected. It is totally fine to use the 50G database.

duxi190914 commented 2 weeks ago

Dear professor, I used the database you provided, running the protein file , but I still got an error. I don't know how to solve the problem, and the next is the code that reported the error. (base) @.*** hgtdb_20230102]$ hgtector search -i 1612.all.faa -o 8.281 Homology search started at 2024-08-28 15:43:39.969565. Settings: Search method: remote. Self-alignment method: native. Remote fetch enabled: yes. Reading input proteins... 1612.all: 2357 proteins. Done. Read 2357 proteins from 1 samples. Dropping sequences shorter than 30 aa... done. Initiating custom taxonomy database... done. Batch homology search of 1612.all started at 2024-08-28 15:43:40.175473. Number of queries: 2357. Submitting 20 queries for search. RID: CXNFJTSU016..................... Success. Results retrieved. Fetching taxIds of 8485 sequences from remote server... Fetched information of 100 entries. .... Done. Obtained taxIds of 8485 sequences. Fetching 4558 taxIds and ancestors from remote server... Fetched information of 100 entries. ...... Done. Obtained taxonomy of 4558 taxIds. Submitting 9 queries for self-alignment. RID: CXUU11WP114. Results retrieved. Submitting 11 queries for self-alignment. RID: CXUX49BY114. Results retrieved. 20 queries completed. Submitting 17 queries for search. RID: CXV22SXB013............... Success. Results retrieved. Fetching taxIds of 7707 sequences from remote server... Fetched information of 100 entries. ..... Traceback (most recent call last): File "/home/wangzx/miniconda3/bin/hgtector", line 4, in import('pkg_resources').run_script('hgtector==2.0b3', 'hgtector') File "/home/wangzx/miniconda3/lib/python3.9/site-packages/pkg_resources/init.py", line 720, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/wangzx/miniconda3/lib/python3.9/site-packages/pkg_resources/init.py", line 1559, in run_script exec(code, namespace, namespace) File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/EGG-INFO/scripts/hgtector", line 96, in main() File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/EGG-INFO/scripts/hgtector", line 35, in main module(args) File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/hgtector/search.py", line 182, in call self.taxid_wf(res) File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/hgtector/search.py", line 745, in taxid_wf newmap = {x[0]: x[1] for x in self.remote_seqinfo(ids2q)} File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/hgtector/search.py", line 1257, in remote_seqinfo return self.parse_fasta_xml(self.remote_fetches( File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/hgtector/search.py", line 1292, in remote_fetches res += self.remote_fetch(urlapi.format(','.join(batch))) File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/hgtector/search.py", line 1337, in remote_fetch return response.read().decode('utf-8') File "/home/wangzx/miniconda3/lib/python3.9/http/client.py", line 470, in read return self._readall_chunked() File "/home/wangzx/miniconda3/lib/python3.9/http/client.py", line 577, in _readall_chunked chunk_left = self._get_chunk_left() File "/home/wangzx/miniconda3/lib/python3.9/http/client.py", line 560, in _get_chunk_left chunk_left = self._read_next_chunk_size() File "/home/wangzx/miniconda3/lib/python3.9/http/client.py", line 520, in _read_next_chunk_size line = self.fp.readline(_MAXLINE + 1) File "/home/wangzx/miniconda3/lib/python3.9/socket.py", line 704, in readinto return self._sock.recv_into(b) File "/home/wangzx/miniconda3/lib/python3.9/ssl.py", line 1242, in recv_into return self.read(nbytes, buffer) File "/home/wangzx/miniconda3/lib/python3.9/ssl.py", line 1100, in read return self._sslobj.read(len, buffer) socket.timeout: The read operation timed out

wish you all the best, duxi duxixixixixi

@. | ---- Replied Message ---- | From | Qiyun @.> | | Date | 8/28/2024 00:54 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [qiyunlab/HGTector] taxid (Issue #137) |

Hi @duxi190914 It could be that NCBI has updated its database structure such that the existing HGTector pulls more than expected. It is totally fine to use the 50G database.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

duxi190914 commented 1 week ago
Dear professor, I noticed when I rum the hgtector, one progress is different. Sometimes is Remote fetch enabled: yes ,and sometimes is Remote fetch enabled: No. When I ues the datebase you provided is yes, it doesn't have influence to resullts or have infulences. wish you all the best duxi duxixixixixi

@. | ---- Replied Message ---- | From | @.> | | Date | 8/28/2024 18:19 | | To | @.**@.> | | Subject | Re: [qiyunlab/HGTector] taxid (Issue #137) | Dear professor, I used the database you provided, running the protein file , but I still got an error. I don't know how to solve the problem, and the next is the code that reported the error. (base) @.*** hgtdb_20230102]$ hgtector search -i 1612.all.faa -o 8.281 Homology search started at 2024-08-28 15:43:39.969565. Settings: Search method: remote. Self-alignment method: native. Remote fetch enabled: yes. Reading input proteins... 1612.all: 2357 proteins. Done. Read 2357 proteins from 1 samples. Dropping sequences shorter than 30 aa... done. Initiating custom taxonomy database... done. Batch homology search of 1612.all started at 2024-08-28 15:43:40.175473. Number of queries: 2357. Submitting 20 queries for search. RID: CXNFJTSU016..................... Success. Results retrieved. Fetching taxIds of 8485 sequences from remote server... Fetched information of 100 entries. .... Done. Obtained taxIds of 8485 sequences. Fetching 4558 taxIds and ancestors from remote server... Fetched information of 100 entries. ...... Done. Obtained taxonomy of 4558 taxIds. Submitting 9 queries for self-alignment. RID: CXUU11WP114. Results retrieved. Submitting 11 queries for self-alignment. RID: CXUX49BY114. Results retrieved. 20 queries completed. Submitting 17 queries for search. RID: CXV22SXB013............... Success. Results retrieved. Fetching taxIds of 7707 sequences from remote server... Fetched information of 100 entries. ..... Traceback (most recent call last): File "/home/wangzx/miniconda3/bin/hgtector", line 4, in import('pkg_resources').run_script('hgtector==2.0b3', 'hgtector') File "/home/wangzx/miniconda3/lib/python3.9/site-packages/pkg_resources/init.py", line 720, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/wangzx/miniconda3/lib/python3.9/site-packages/pkg_resources/init.py", line 1559, in run_script exec(code, namespace, namespace) File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/EGG-INFO/scripts/hgtector", line 96, in main() File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/EGG-INFO/scripts/hgtector", line 35, in main module(args) File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/hgtector/search.py", line 182, in call self.taxid_wf(res) File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/hgtector/search.py", line 745, in taxid_wf newmap = {x[0]: x[1] for x in self.remote_seqinfo(ids2q)} File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/hgtector/search.py", line 1257, in remote_seqinfo return self.parse_fasta_xml(self.remote_fetches( File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/hgtector/search.py", line 1292, in remote_fetches res += self.remote_fetch(urlapi.format(','.join(batch))) File "/home/wangzx/miniconda3/lib/python3.9/site-packages/hgtector-2.0b3-py3.9.egg/hgtector/search.py", line 1337, in remote_fetch return response.read().decode('utf-8') File "/home/wangzx/miniconda3/lib/python3.9/http/client.py", line 470, in read return self._readall_chunked() File "/home/wangzx/miniconda3/lib/python3.9/http/client.py", line 577, in _readall_chunked chunk_left = self._get_chunk_left() File "/home/wangzx/miniconda3/lib/python3.9/http/client.py", line 560, in _get_chunk_left chunk_left = self._read_next_chunk_size() File "/home/wangzx/miniconda3/lib/python3.9/http/client.py", line 520, in _read_next_chunk_size line = self.fp.readline(_MAXLINE + 1) File "/home/wangzx/miniconda3/lib/python3.9/socket.py", line 704, in readinto return self._sock.recv_into(b) File "/home/wangzx/miniconda3/lib/python3.9/ssl.py", line 1242, in recv_into return self.read(nbytes, buffer) File "/home/wangzx/miniconda3/lib/python3.9/ssl.py", line 1100, in read return self._sslobj.read(len, buffer) socket.timeout: The read operation timed out

wish you all the best, duxi duxixixixixi

@. | ---- Replied Message ---- | From | Qiyun @.> | | Date | 8/28/2024 00:54 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [qiyunlab/HGTector] taxid (Issue #137) |

Hi @duxi190914 It could be that NCBI has updated its database structure such that the existing HGTector pulls more than expected. It is totally fine to use the 50G database.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

qiyunzhu commented 6 days ago

Hello @duxi190914 It you disable remote fetch, it may resolve the problem. Remote fetch could introduce instability of the taxonomy structure.