Open duxi190914 opened 3 weeks ago
It will. Could there be some issues with the database? If it was automatically generated from NCBI, all sequences should have taxIDs.
Dear professor, I downloaded the database from NCBI before, but many sequences couldn't get taxIDs. So I re-downloaded the database you provided, but there are still some new errors, and the error code is in word. I have another question. The database downloaded by ncbi has 350G, but the one you provided is only 50G, could the database generated by you be used as the final result?
wish you all the best duxi ---- Replied Message ---- | From | Qiyun @.> | | Date | 8/22/2024 03:56 | | To | @.> | | Cc | @.> , @.> | | Subject | Re: [qiyunlab/HGTector] taxid (Issue #137) |
It will. Could there be some issues with the database? If it was automatically generated from NCBI, all sequences should have taxIDs.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Hi @duxi190914 It could be that NCBI has updated its database structure such that the existing HGTector pulls more than expected. It is totally fine to use the 50G database.
Dear professor,
I used the database you provided, running the protein file , but I still got an error. I don't know how to solve the problem, and the next is the code that reported the error.
(base) @.*** hgtdb_20230102]$ hgtector search -i 1612.all.faa -o 8.281
Homology search started at 2024-08-28 15:43:39.969565.
Settings:
Search method: remote.
Self-alignment method: native.
Remote fetch enabled: yes.
Reading input proteins...
1612.all: 2357 proteins.
Done. Read 2357 proteins from 1 samples.
Dropping sequences shorter than 30 aa... done.
Initiating custom taxonomy database... done.
Batch homology search of 1612.all started at 2024-08-28 15:43:40.175473.
Number of queries: 2357.
Submitting 20 queries for search. RID: CXNFJTSU016..................... Success. Results retrieved.
Fetching taxIds of 8485 sequences from remote server...
Fetched information of 100 entries.
....
Done. Obtained taxIds of 8485 sequences.
Fetching 4558 taxIds and ancestors from remote server...
Fetched information of 100 entries.
......
Done. Obtained taxonomy of 4558 taxIds.
Submitting 9 queries for self-alignment. RID: CXUU11WP114. Results retrieved.
Submitting 11 queries for self-alignment. RID: CXUX49BY114. Results retrieved.
20 queries completed.
Submitting 17 queries for search. RID: CXV22SXB013............... Success. Results retrieved.
Fetching taxIds of 7707 sequences from remote server...
Fetched information of 100 entries.
.....
Traceback (most recent call last):
File "/home/wangzx/miniconda3/bin/hgtector", line 4, in
wish you all the best, duxi | duxixixixixi | |
---|---|---|
@. | ---- Replied Message ---- | From | Qiyun @.> | | Date | 8/28/2024 00:54 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [qiyunlab/HGTector] taxid (Issue #137) |
Hi @duxi190914 It could be that NCBI has updated its database structure such that the existing HGTector pulls more than expected. It is totally fine to use the 50G database.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Dear professor, I noticed when I rum the hgtector, one progress is different. Sometimes is Remote fetch enabled: yes ,and sometimes is Remote fetch enabled: No. When I ues the datebase you provided is yes, it doesn't have influence to resullts or have infulences. wish you all the best duxi | duxixixixixi | |
---|---|---|
@.
|
---- Replied Message ----
| From | @.> |
| Date | 8/28/2024 18:19 |
| To | @.**@.> |
| Subject | Re: [qiyunlab/HGTector] taxid (Issue #137) |
Dear professor,
I used the database you provided, running the protein file , but I still got an error. I don't know how to solve the problem, and the next is the code that reported the error.
(base) @.*** hgtdb_20230102]$ hgtector search -i 1612.all.faa -o 8.281
Homology search started at 2024-08-28 15:43:39.969565.
Settings:
Search method: remote.
Self-alignment method: native.
Remote fetch enabled: yes.
Reading input proteins...
1612.all: 2357 proteins.
Done. Read 2357 proteins from 1 samples.
Dropping sequences shorter than 30 aa... done.
Initiating custom taxonomy database... done.
Batch homology search of 1612.all started at 2024-08-28 15:43:40.175473.
Number of queries: 2357.
Submitting 20 queries for search. RID: CXNFJTSU016..................... Success. Results retrieved.
Fetching taxIds of 8485 sequences from remote server...
Fetched information of 100 entries.
....
Done. Obtained taxIds of 8485 sequences.
Fetching 4558 taxIds and ancestors from remote server...
Fetched information of 100 entries.
......
Done. Obtained taxonomy of 4558 taxIds.
Submitting 9 queries for self-alignment. RID: CXUU11WP114. Results retrieved.
Submitting 11 queries for self-alignment. RID: CXUX49BY114. Results retrieved.
20 queries completed.
Submitting 17 queries for search. RID: CXV22SXB013............... Success. Results retrieved.
Fetching taxIds of 7707 sequences from remote server...
Fetched information of 100 entries.
.....
Traceback (most recent call last):
File "/home/wangzx/miniconda3/bin/hgtector", line 4, in
wish you all the best, duxi | duxixixixixi | |
---|---|---|
@. | ---- Replied Message ---- | From | Qiyun @.> | | Date | 8/28/2024 00:54 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [qiyunlab/HGTector] taxid (Issue #137) |
Hi @duxi190914 It could be that NCBI has updated its database structure such that the existing HGTector pulls more than expected. It is totally fine to use the 50G database.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Hello @duxi190914 It you disable remote fetch, it may resolve the problem. Remote fetch could introduce instability of the taxonomy structure.
Homology search started at 2024-08-21 08:40:06.163220. Settings: Search method: diamond. Self-alignment method: native. Remote fetch enabled: no. Reading input proteins... Sulfurospirillum_multivorans: 3224 proteins. Done. Read 3224 proteins from 1 samples. Dropping sequences shorter than 30 aa... done. Reading local taxonomy database... done. Read 2597948 taxa. Batch homology search of Sulfurospirillum_multivorans started at 2024-08-21 08:40:16.227200. Number of queries: 3223. WARNING: Cannot obtain taxIds for 470074 sequences. These hits will be dropped. There are so many taxIds that have not been obtained, will it affect the results