Closed StromTroopers closed 6 years ago
Hello,
Can you reply me the "head -n 10 " of each of the input files?
This will help me track down the error. It is failing to parse an input file.
Pete
From: Grendel26 [notifications@github.com] Sent: 12 May 2018 20:11 To: peterthorpe5/public_scripts Cc: Subscribed Subject: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)
Hi, I'm actually using your programm but I found some issue such this one:
INFO: Starting testing: Sat May 12 20:50:59 2018
Traceback (most recent call last):
File "/pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 1034, in
Here is my script:
source /panhome/me//miniconda3/bin/activate export PYTHONPATH=$PYTHONPATH:/panhome/me/miniconda3/lib/python3.6/site-packages diamond_tab_output=/pandata/me/LEPIWASP/blast_database/matches.m8 Diamond_blast_to_taxid=/pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py
taxid=/pandata/me/LEPIWASP/blast_database/gi_taxid_prot.dmp
categories=/pandata/me/LEPIWASP/blast_database/categories.dmp
names=/pandata/me/LEPIWASP/blast_database/names.dmp
description=/pandata/me/LEPIWASP/blast_database/acc_to_des.tab
$Diamond_blast_to_taxid -i $diamond_tab_output -t $taxid -c $categories -n $names -d $description -o outfile_sp1.tab
Do you know where could be the issue?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqEo5PdoVkT58C0NpE_ne_x5XJF7Aks5txzPOgaJpZM4T8h1B.
The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796
For sure:
/blast_database$ head -n 10 categories.dmp
B 7 7
B 9 9
B 11 11
B 14 14
B 17 17
B 19 19
B 21 21
B 23 23
B 24 24
B 25 25
/blast_database$ head -n 10 gi_taxid_prot.dmp
6 9913
8 9913
10 9913
12 9913
14 9913
32 9913
35 9913
42 9913
44 9913
46 9913
/blast_database$ head -n 10 names.dmp
1 | all | | synonym |
1 | root | | scientific name |
2 | Bacteria | Bacteria <prokaryotes> | scientific name |
2 | Monera | Monera <Bacteria> | in-part |
2 | Procaryotae | Procaryotae <Bacteria> | in-part |
2 | Prokaryota | Prokaryota <Bacteria> | in-part |
2 | Prokaryotae | Prokaryotae <Bacteria> | in-part |
2 | bacteria | bacteria <blast2> | blast name |
2 | eubacteria | | genbank common name |
2 | not Bacteria Haeckel 1894 | | synonym |
/blast_database$ head -n 10 acc_to_des.tab
WP_003131952.1 Full=30S ribosomal protein S18Q02VU1.1 RecName
XP_642131.1 Full=Calfumirin-1; Short=CAF-1BAA06266.1 calfumirin-1 [Dictyostelium discoideum AX2]EAL68086.1 hypothetical protein DDB_G0277827 [Dictyostelium discoideum AX4]
XP_642837.1 hypothetical protein DDB_G0276911 [Dictyostelium discoideum AX4]EAL68957.1 hypothetical protein DDB_G0276911 [Dictyostelium discoideum AX4]
WP_000184067.1 MbtH family protein [Bacillus]NP_844755.1 hypothetical protein BA_2373 [Bacillus anthracis str. Ames]YP_028470.1 hypothetical protein BAS2209 [Bacillus anthracis str. Sterne]YP_036475.1 balhimycin biosynthetic protein MbtH [[Bacillus thuringiensis] serovar konkukian str. 97-27]AAP26241.1 mbtH-like protein [Bacillus anthracis str. Ames]AAT31492.1 mbtH-like protein [Bacillus anthracis str. 'Ames Ancestor']AAT54521.1 mbtH-like protein [Bacillus anthracis str. Sterne]AAT62162.1 MbtH protein [[Bacillus thuringiensis] serovar konkukian str. 97-27]ABK85418.1 mbtH-like protein [Bacillus thuringiensis str. Al Hakam]EDR19165.1 mbtH-like protein [Bacillus anthracis str. A0488]EDR87721.1 mbtH-like protein [Bacillus anthracis str. A0193]EDR94244.1 mbtH-like protein [Bacillus anthracis str. A0442]EDS97287.1 mbtH-like protein [Bacillus anthracis str. A0389]EDT19705.1 mbtH-like protein [Bacillus anthracis str. A0465]EDT69654.1 mbtH-like protein [Bacillus anthracis str. A0174]EDV17672.1 mbtH-like protein [Bacillus anthracis str. Tsiankovskii-I]EDX57451.1 mbtH-like protein [Bacillus cereus W]EDX64509.1 mbtH-like protein [Bacillus cereus 03BB108]EDX67797.1 MbtH-like protein [Bacillus cereus NVH0597-99]ACK90518.1 mbtH-like protein [Bacillus cereus AH820]ACP13435.1 MbtH-like protein [Bacillus anthracis str. CDC 684]ACQ49836.1 mbtH-like protein [Bacillus anthracis str. A0248]ADK04965.1 MbtH-like protein [Bacillus cereus biovar anthracis str. CI]AEW55483.1 Polymyxin synthetase PmxB [Bacillus cereus F837/76]AFH83638.1 MbtH-like protein [Bacillus anthracis str. H9401]EJQ94779.1 hypothetical protein IGW_02499 [Bacillus cereus ISP3191]EJT19014.1 MbtH-like protein [Bacillus anthracis str. UR-1]EJY93945.1 MbtH-like protein [Bacillus anthracis str. BF1]AHE83774.1 antibiotic transporter [Bacillus anthracis str. A16R]AHE89667.1 antibiotic transporter [Bacillus anthracis str. A16]GAE97797.1 polymyxin synthetase PmxB [Bacillus anthracis CZC5]EVT90821.1 antibiotic transporter [Bacillus anthracis 8903-G]EVT99096.1 antibiotic transporter [Bacillus anthracis 9080-G]EVU05530.1 antibiotic transporter [Bacillus anthracis 52-G]AHK38425.1 MbtH-like protein [Bacillus anthracis str. SVA11]EXJ20374.1 antibiotic transporter [Bacillus anthracis str. 95014]AIF56587.1 antibiotic transporter [Bacillus anthracis]KEY96341.1 antibiotic transporter [Bacillus anthracis str. Carbosap]KFJ82161.1 protein mbtH [Bacillus anthracis]AIK32172.1 protein mbtH [Bacillus anthracis]AIK56953.1 protein mbtH [Bacillus anthracis]AIK65383.1 protein mbtH [Bacillus anthracis str. Vollum]AIK51564.1 protein mbtH [Bacillus anthracis]KFL64197.1 protein mbtH [Bacillus anthracis]KFL68944.1 protein mbtH [Bacillus anthracis]AIM06200.1 mbtH-like protein [Bacillus anthracis]AIM11627.1 mbtH-like protein [Bacillus anthracis]KGZ46499.1 antibiotic transporter [Bacillus anthracis]KGZ52568.1 antibiotic transporter [Bacillus anthracis]KGZ53472.1 antibiotic transporter [Bacillus anthracis]KGZ66622.1 antibiotic transporter [Bacillus anthracis]KGZ68471.1 antibiotic transporter [Bacillus anthracis]KGZ71447.1 antibiotic transporter [Bacillus anthracis]KGZ79538.1 antibiotic transporter [Bacillus anthracis]KGZ85467.1 antibiotic transporter [Bacillus anthracis]KGZ87774.1 antibiotic transporter [Bacillus anthracis]KGZ93918.1 antibiotic transporter [Bacillus anthracis]KGZ95083.1 antibiotic transporter [Bacillus anthracis]KGZ97965.1 antibiotic transporter [Bacillus anthracis]KHA13634.1 antibiotic transporter [Bacillus anthracis]KHA14167.1 antibiotic transporter [Bacillus anthracis]KHA14756.1 antibiotic transporter [Bacillus anthracis]KHA22885.1 antibiotic transporter [Bacillus anthracis]KHA24054.1 antibiotic transporter [Bacillus anthracis]KHA41046.1 antibiotic transporter [Bacillus anthracis]KHA42550.1 antibiotic transporter [Bacillus anthracis]KHG44919.1 antibiotic transporter [Bacillus anthracis]KHG51266.1 antibiotic transporter [Bacillus anthracis]KHG61191.1 antibiotic transporter [Bacillus anthracis]AJA86607.1 antibiotic transporter [Bacillus anthracis]AJF89965.1 antibiotic transporter [Bacillus anthracis]AJG29085.1 antibiotic transporter [Bacillus anthracis]AJG50646.1 protein mbtH [Bacillus anthracis str. Turkey32]AJG59430.1 protein mbtH [Bacillus cereus D17]AJG64878.1 protein mbtH [Bacillus anthracis]AJG68575.1 protein mbtH [Bacillus anthracis]AJG75784.1 protein mbtH [Bacillus thuringiensis]AJG83850.1 protein mbtH [Bacillus anthracis]AJG87422.1 protein mbtH [Bacillus anthracis]AJH27492.1 mbtH-like family protein [Bacillus anthracis]AJH36379.1 mbtH-like family protein [Bacillus anthracis]AJH38448.1 mbtH-like family protein [Bacillus anthracis]AJH46106.1 mbtH-like family protein [Bacillus anthracis str. Sterne]AJH49939.1 mbtH-like family protein [Bacillus anthracis]AJH57940.1 mbtH-like family protein [Bacillus anthracis]AJH63291.1 mbtH-like family protein [Bacillus cereus]AJH98472.1 mbtH-like family protein [Bacillus anthracis str. V770-NP-1R]AJH68770.1 mbtH-like family protein [Bacillus thuringiensis]AJI10795.1 mbtH-like family protein [Bacillus cereus 03BB108]AJH81185.1 mbtH-like family protein [Bacillus thuringiensis]AJH88257.1 mbtH-like family protein [Bacillus anthracis]AJH94559.1 mbtH-like family protein [Bacillus anthracis]AJI32648.1 mbtH-like family protein [Bacillus thuringiensis]AJI37662.1 mbtH-like family protein [Bacillus anthracis]AJK33170.1 mbtH-like family protein [Bacillus cereus]AJM80875.1 antibiotic transporter [Bacillus anthracis]KKM29552.1 antibiotic transporter [Bacillus anthracis]KKM31290.1 antibiotic transporter [Bacillus anthracis]BAR76907.1 protein mbtH [Bacillus anthracis]GAO65037.1 MbtH protein [Bacillus anthracis]GAO59299.1 MbtH protein [Bacillus anthracis]KLA13430.1 hypothetical protein B4087_2330 [Bacillus cereus]KLV16174.1 antibiotic transporter [Bacillus anthracis]KMP73446.1 antibiotic transporter [Bacillus cereus]COF36624.1 Uncharacterized protein conserved in bacteria [Streptococcus pneumoniae]KOM58780.1 antibiotic transporter [Bacillus anthracis]KOM66344.1 antibiotic transporter [Bacillus anthracis]KOM74316.1 antibiotic transporter [Bacillus anthracis]KOM79871.1 antibiotic transporter [Bacillus anthracis]KOM85685.1 antibiotic transporter [Bacillus anthracis]KOM93139.1 antibiotic transporter [Bacillus anthracis]KON02851.1 antibiotic transporter [Bacillus anthracis]KON19765.1 antibiotic transporter [Bacillus anthracis]KON23345.1 antibiotic transporter [Bacillus anthracis]KOR56487.1 antibiotic transporter [Bacillus anthracis]KOR64746.1 antibiotic transporter [Bacillus anthracis]CUB40593.1 MbtH-like protein [Bacillus cereus]CUB50981.1 MbtH-like protein [Bacillus subtilis]ALC34486.1 antibiotic transporter [Bacillus anthracis]KWU56222.1 antibiotic transporter [Bacillus cereus]AMC04350.1 antibiotic transporter [Bacillus anthracis]KXX86462.1 antibiotic transporter [Bacillus cereus]KXY63330.1 antibiotic transporter [Bacillus cereus]KXY85405.1 antibiotic transporter [Bacillus cereus]KYZ64779.1 antibiotic transporter [Bacillus sp. GZT]ANH86541.1 antibiotic transporter [Bacillus anthracis]OBV06872.1 antibiotic transporter [Bacillus anthracis]OBV08044.1 antibiotic transporter [Bacillus anthracis]ANR04840.1 antibiotic transporter [Bacillus anthracis]ANR10137.1 antibiotic transporter [Bacillus anthracis]ANR15436.1 antibiotic transporter [Bacillus anthracis]ANR20737.1 antibiotic transporter [Bacillus anthracis]ANR26037.1 antibiotic transporter [Bacillus anthracis]ANR31338.1 antibiotic transporter [Bacillus anthracis]ANR36642.1 antibiotic transporter [Bacillus anthracis]ANR41938.1 antibiotic transporter [Bacillus anthracis]ANR47229.1 antibiotic transporter [Bacillus anthracis]ANR52527.1 antibiotic transporter [Bacillus anthracis]ANR57822.1 antibiotic transporter [Bacillus anthracis]OHO06076.1 antibiotic transporter [Bacillus anthracis]OJD92396.1 antibiotic transporter [Bacillus anthracis]OKA49635.1 antibiotic transporter [Bacillus anthracis]APT25844.1 antibiotic transporter [Bacillus anthracis]AQM46304.1 antibiotic transporter [Bacillus anthracis]OON46530.1 antibiotic transporter [Bacillus anthracis]OOX82618.1 antibiotic transporter [Bacillus anthracis]OOZ95984.1 antibiotic transporter [Bacillus cereus]OPA04001.1 antibiotic transporter [Bacillus cereus]OPD55927.1 antibiotic transporter [Bacillus anthracis]OPE63914.1 antibiotic transporter [Bacillus anthracis]OPE65888.1 antibiotic transporter [Bacillus anthracis]OPE77433.1 antibiotic transporter [Bacillus anthracis]OPE84282.1 antibiotic transporter [Bacillus anthracis]OPE86441.1 antibiotic transporter [Bacillus anthracis]OPE91785.1 antibiotic transporter [Bacillus anthracis]OPE97551.1 antibiotic transporter [Bacillus anthracis]OPF04166.1 antibiotic transporter [Bacillus anthracis]OPF12941.1 antibiotic transporter [Bacillus anthracis]OTW54644.1 antibiotic transporter [Bacillus thuringiensis serovar mexicanensis]OTW95868.1 antibiotic transporter [Bacillus thuringiensis serovar monterrey]OTX36822.1 antibiotic transporter [Bacillus thuringiensis serovar brasilensis]OTX46645.1 antibiotic transporter [Bacillus thuringiensis serovar pondicheriensis]OTY78694.1 antibiotic transporter [Bacillus thuringiensis serovar vazensis]OUA97881.1 antibiotic transporter [Bacillus thuringiensis serovar oswaldocruzi]SME21851.1 MbtH-like protein [Bacillus cereus]ARZ62528.1 antibiotic transporter [Bacillus thuringiensis]ASE32524.1 MbtH family protein [Bacillus anthracis]OXM02662.1 antibiotic transporter [Bacillus anthracis]PDP00971.1 MbtH family protein [Bacillus anthracis]PDP05901.1 MbtH family protein [Bacillus anthracis]PDP09673.1 MbtH family protein [Bacillus anthracis]PDP14615.1 MbtH family protein [Bacillus anthracis]PDP21847.1 MbtH family protein [Bacillus anthracis]PDP26476.1 MbtH family protein [Bacillus anthracis]PDP33868.1 MbtH family protein [Bacillus anthracis]PGB53585.1 MbtH family protein [Bacillus anthracis]AUD25801.1 MbtH family protein [Bacillus sp. HBCD-sjtu]PMU02308.1 MbtH family protein [Bacillus sp. UAEU-H3K6M1]PNS47938.1 MbtH family protein [Bacillus anthracis]PNS54332.1 MbtH family protein [Bacillus anthracis]PNS59832.1 MbtH family protein [Bacillus anthracis]PNS65202.1 MbtH family protein [Bacillus anthracis]PNS73917.1 MbtH family protein [Bacillus anthracis]PNS77544.1 MbtH family protein [Bacillus anthracis]PNS83210.1 MbtH family protein [Bacillus anthracis]PRD00625.1 MbtH family protein [Bacillus cereus]PRD06309.1 MbtH family protein [Bacillus cereus]PRD59520.1 MbtH family protein [Bacillus anthracis]PTR53886.1 MbtH family protein [Bacillus anthracis]PTR59175.1 MbtH family protein [Bacillus anthracis]PTR66756.1 MbtH family protein [Bacillus anthracis]PTR74157.1 MbtH family protein [Bacillus anthracis]PTR75459.1 MbtH family protein [Bacillus anthracis]PTR80521.1 MbtH family protein [Bacillus anthracis]PTR85276.1 MbtH family protein [Bacillus anthracis]PTR91761.1 MbtH family protein [Bacillus anthracis]
WP_007051162.1 argininosuccinate lyase [Bifidobacterium]NP_696229.1 argininosuccinate lyase [Bifidobacterium longum NCC2705]Q8G5F3.1 RecName
WP_000135199.1 30S ribosomal protein S18 [Bacteria]NP_313205.1 30S ribosomal protein S18 [Escherichia coli O157:H7 str. Sakai]NP_418623.1 30S ribosomal subunit protein S18 [Escherichia coli str. K-12 substr. MG1655]NP_458827.1 30s ribosomal subunit protein S18 [Salmonella enterica subsp. enterica serovar Typhi str. CT18]NP_710065.1 30S ribosomal protein S18 [Shigella flexneri 2a str. 301]YP_405749.1 30S ribosomal protein S18 [Shigella dysenteriae Sd197]YP_002410527.1 30S ribosomal protein S18 [Escherichia coli IAI39]YP_002415332.1 30S ribosomal subunit protein S18 [Escherichia coli UMN026]YP_003611111.1 30S ribosomal protein S18 [Enterobacter cloacae subsp. cloacae ATCC 13047]YP_004592087.1 30S ribosomal protein S18 [Klebsiella aerogenes KCTC 2190]YP_005224742.1 30S ribosomal protein S18 [Klebsiella pneumoniae subsp. pneumoniae HS11286]YP_006122613.1 30S ribosomal protein S18 [Escherichia coli O83:H1 str. NRG 857C]YP_006781181.1 30S ribosomal protein S18 [Escherichia coli O104:H4 str. 2011C-3493]NP_463254.2 30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Typhimurium str. LT2]P0A7T7.2 RecName
WP_003251213.1 leucyl/phenylalanyl-tRNA--protein transferase [Pseudomonas]NP_746135.1 leucyl/phenylalanyl-tRNA--protein transferase [Pseudomonas putida KT2440]Q88FS7.1 RecName
WP_003409891.1 SecB-like chaperone [Mycobacterium]NP_216473.1 SecB-like chaperone [Mycobacterium tuberculosis H37Rv]YP_009359329.1 HYPOTHETICAL PROTEIN BQ2027_MB1992 [Mycobacterium bovis AF2122/97]P95257.1 RecName
WP_000379821.1 O-acetyltransferase OatA [Staphylococcus]YP_501338.1 hypothetical protein SAOUHSC_02885 [Staphylococcus aureus subsp. aureus NCTC 8325]Q5HCY3.1 RecName
WP_000332037.1 ribonucleoside-diphosphate reductase 1 subunit beta [Proteobacteria]NP_311145.1 ribonucleotide-diphosphate reductase subunit beta [Escherichia coli O157:H7 str. Sakai]NP_416738.1 ribonucleoside-diphosphate reductase 1, beta subunit, ferritin-like protein [Escherichia coli str. K-12 substr. MG1655]YP_403993.1 ribonucleotide-diphosphate reductase subunit beta [Shigella dysenteriae Sd197]YP_002413284.1 ribonucleoside-diphosphate reductase 1 subunit beta [Escherichia coli UMN026]P69924.2 RecName
/blast_database$ head -n 10 matches.m8
g13600.t1_0042_0042 AAQ57129.1 74.2 341 87 1 1 341 549 888 5.0e-145 523.5
g13600.t1_0042_0042 XP_013161242.1 75.0 300 74 1 1 300 561 859 2.6e-125 458.0
g13600.t1_0042_0042 CAX36787.1 74.4 301 76 1 10 310 1 300 9.8e-125 456.1
g13600.t1_0042_0042 KLV34197.1 53.4 341 156 2 1 340 296 634 1.6e-98 369.0
g13600.t1_0042_0042 XP_014358783.1 53.5 340 154 2 1 340 554 889 2.5e-96 361.7
g13600.t1_0042_0042 XP_013173125.1 76.9 212 48 1 1 212 485 695 3.3e-88 334.7
g13600.t1_0042_0042 XP_013168065.1 72.5 222 60 1 1 222 611 831 3.4e-85 324.7
g13600.t1_0042_0042 XP_014357403.1 70.0 220 64 2 38 257 865 1082 1.4e-78 302.8
g13600.t1_0042_0042 XP_014356712.1 69.5 197 59 1 134 330 1 196 3.1e-70 275.0
g13600.t1_0042_0042 XP_013163722.1 78.9 152 31 1 1 152 1 151 2.2e-63 252.3
Do you have an idea where is the issue?
yes: So it failes in this function - assign_taxon_to_dic(acc_taxid_prot) This needs the prot.accession2taxid file downloaded from NCBI. The file is formatted as so (Exactly as it is dowloaded, but decompressed): acc acc_version tax_id GI XP_642131 XP_642131.1 352472 66816243
You gave it the "gi_taxid_prot.dmp " it needs the prot.accession2taxid. If you really want to do it with that file I think the old script in the legacy folder in this tool works that way ... You can download it using wget below, then gunzip.
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz.md5 wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz md5sum -c prot.accession2taxid.gz.md5
gunzip prot.accession2taxid.gz
then alter your -t command, and it should work.
cheers,
Pete
p.s. this newer version is quite RAM hungry and may need more like 60GB.
Ok, it's seems it's running thank you :). How long do you think will the process take with 8 ppn?
It takes a while to load the prot to acc into RAM, then the whole thing should run in under 3 hours. At a guess.
Im glad it is running for you.
Cheers,
Pete
From: Grendel26 [mailto:notifications@github.com] Sent: 14 May 2018 09:43 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)
Ok, it's seems it's running thank you :). How long do you think will the process take with 8 ppn?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-388741465, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqDyVq2J52qa8s4s32Ld8CobBaf4-ks5tyUOdgaJpZM4T8h1B.
The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796
Finnaly I got another issue here is the error message:
INFO: Starting testing: Mon May 14 10:49:42 2018
INFO: loaded gi to description database
INFO: Annotating tax id info to tab file
Traceback (most recent call last):
File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 1034, in <module>
logger)
File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 483, in parse_diamo
nd_tab
if not parse_blast_line(line, logger):
File "/pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 312, in parse_blast
_line
accession, line = get_accession_number(line, logger)
File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 339, in get_accessi
on_number
acc = acces_column.split("|")[1]
IndexError: list index out of range
Do you know what is the issue?
Here is the head of the prot.accession2taxid file:
accession accession.version taxid gi
P26567 P26567.2 4577 1168978
P12208 P12208.1 3197 116525
P12210 P12210.1 4097 116527
P24064 P24064.2 4565 17374148
P22260 P22260.2 190485 21903391
P17697 P17697.1 9913 116530
P25473 P25473.1 9615 116531
P14018 P14018.2 93934 1705937
P10909 P10909.1 9606 116533
yes, OK. I think your BLAST output is in a different format to mine. Give me a few mins. I will alter the script on github.
I made it with diamond by running: $diamond blastp -d $nr -q $candidates_aa_0035 -o matches_0035.m8
Theres nothing wrong with your Diamond run. Our BLAST NR database lags behind the current version, so the format was old for when I was writing this - I think.
5 more mins.
ok try this: Diamond_blast_to_taxid.py.
Ok it is running, I will let you know if it works thank you.
OK now I get this issue :
INFO: sys.version_info(major=3, minor=6, micro=5, releaselevel='final', serial=0)
INFO: Command-line: /pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info
/Diamond_blast_to_taxid2.py -i /pandata/me/blast_database/matches.m8 -t /pandata/me/LEPIWASP/bla
st_database/prot.accession2taxid -c /pandata/me/blast_database/categories.dmp -n /pandata/me/blast_database/names.dmp -d /pandata/me/blast_database/acc_to_des.tab -o outfile_sp1.tab
INFO: Starting testing: Mon May 14 12:16:43 2018
INFO: loaded gi to description database
INFO: Annotating tax id info to tab file
Traceback (most recent call last):
File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 1004, in <module>
logger)
File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 485, in parse_diamond_tab
if not parse_blast_line(line, logger):
File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 311, in parse_blast_line
accession, line = get_accession_number(line, logger)
ValueError: too many values to unpack (expected 2)
this is now updated.
Ok, it runs :) btw I should get the output tab file in my working directory right?
You will get an output tab, like the one you put it but with more columns (descriptions, kingdom, tax_id etc …).
If you have matplotlib installed you will get a graph of the percentage identify of your top blast hits vs the blast database.
You will get the kingdom and Genus breakdown of your top blast hits. You will also get a file with your top blast hits, as a tab file.
The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796
Here is the next issue i got :
INFO: Command-line: /pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info
/Diamond_blast_to_taxid2.py -i /pandata/me/blast_database/matches.m8 -t /pandata/bla
st_database/prot.accession2taxid -c /pandata/me/blast_database/categories.dmp -n /pandata/me/blast_database/names.dmp -d /pandata/me/blast_database/acc_to_des.tab -o outfile_sp1.tab
INFO: Starting testing: Mon May 14 13:04:36 2018
INFO: loaded gi to description database
INFO: Annotating tax id info to tab file
Traceback (most recent call last):
File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 497, in parse_diamond_tab
tax_id = acc_to_tax_id[accession]
KeyError: 'PTY26659'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 1006, in <module>
logger)
File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 500, in parse_diamond_tab
if acc_to_tax_id.has_key(accession.rstrip()):
AttributeError: 'dict' object has no attribute 'has_key'
PTY26659 is not an accession. How has this got into your tab file? grep "PTY26659" prot.accession2taxid - yeilds nothing for me.
$ grep "PTY" prot.accession2taxid Q9PTY0 Q9PTY0.1 7962 47605558 Q9PTY5 Q9PTY5.1 8355 82117647 A0PTY6 A0PTY6.1 362242 166223954 A0PTY0 A0PTY0.1 362242 166991492
Maybe try
$cat "yourBlastOutput" | grep -v "PTY26659 " > newBlastOutput
It's weird, when I do grep "PTY26659" prot.accession2taxid, I get nothing neither. I did cat "matches.m8" | grep -v "PTY26659 " > matches2.m8
I'll let you know if it works :)
OK I have another issue with the same type but with: KeyError: 'PTY10271'
Do you think it is because I used diamond?
I use Diamond all the time. What did you BLAST your sequences against?
From: Grendel26 [mailto:notifications@github.com] Sent: 14 May 2018 14:36 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)
OK I have another issue with the same type but with: KeyError: 'PTY10271' Do you think it is because I used diamond?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-388818508, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqBihwLjgzSpVjchBvBGSqbPzo7Poks5tyYglgaJpZM4T8h1B.
The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796
Hi did all the process one more time:
First: I made the diamond database by using the nr.faa downloaded from the ncbi website, I got from this a nr.dmnd
file
Second: I made a diamond blastp by using the nr.dmnd file against my protein fasta file, I got a matches_0042.m8 file.
Finnaly : I'm running your script with the .m8 file but I still have this key error and when I do:
grep "PTY26659" matches_0042.m8
I indeed have this accesion in my .m8 file:
g11636.t1_0042_0042 PTY26659.1 40.4 285 160 4 413 693 157 435 1.4e-45 194.1
Do you know where is the issue ?
Weird! If you look for that on NCBI, you can't find it. I can code the script to skip it if you want? As this doesn't exist in the accession file, which you have seen yourself...
From: Grendel26 [notifications@github.com] Sent: 14 May 2018 20:53 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)
Hi did all the process one more time:
First: I made the diamond database by using the nr.faa downloaded from the ncbi website, I got from this a nr.dmndfile Second: I made a diamond blastp by using the nr.dmnd file against my protein fasta file, I got a matches_0042.m8 file. Finnaly : I'm running your script with the .m8 file but I still have this key error and when I do: grep "PTY26659" matches_0042.m8 I indeed have this accesion in my .m8 file: g11636.t1_0042_0042 PTY26659.1 40.4 285 160 4 413 693 157 435 1.4e-45 194.1
Do you know where is the issue ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-388941250, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqJc3ttLbeacFe7PvI4xNJ25Om1Bpks5tyeCkgaJpZM4T8h1B.
The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796
Maybe I did something wrong, I mean I did not do blastdbcmd -entry 'all' -db nr > nr.faa
but I downloaded the nr file direclty from here ftp://ftp.ncbi.nlm.nih.gov/blast/db/
the error comes from the nr databse right?
The issue cannot comes from the following commande is not it?
python prepare_accession_to_description_db.py -i nr.faa (default)-o acc_to_des.tab (dafault)
If I well understood, if we skip it, I'll loose the sequences with these accession number?
Do: grep “PTY” BlastOutPut > weirdThings.txt
I want to see how many of these weird things there are.
If we skip them, they will not have this specific Blast hit, but may very well have others!
From: Grendel26 [mailto:notifications@github.com] Sent: 14 May 2018 21:06 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)
Maybe I did something wrong, I mean I did not do blastdbcmd -entry 'all' -db nr > nr.faabut I downloaded the nr file direclty from here ftp://ftp.ncbi.nlm.nih.gov/blast/db/
The issue cannot comes here is not it? python prepare_accession_to_description_db.py -i nr.faa (default)-o acc_to_des.tab (dafault)
If I well understood, if we skip it, I'll loose the sequences with these accession number?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-388945185, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqJ2LBeWntDua39dmrt0j2wCbRFKwks5tyePAgaJpZM4T8h1B.
The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796
Hi actually have 4 of them:
g11636.t1_0042_0042 PTY26659.1 40.4 285 160 4 413 693 157 435 1.4e-45 194.1
g11636.t1_0042_0042 PTY10271.1 40.2 286 160 5 413 693 157 436 7.9e-44 188.3
g11636.t1_0042_0042 PTY26663.1 43.1 246 128 5 452 693 17 254 3.3e-42 183.0
g11636.t1_0042_0042 PTY27709.1 39.2 288 165 4 413 696 37 318 9.7e-42 181.4
All from the same gene.
grep “g11636.t1_0042_0042” and see if you have other decent hits, which could represent this gene. If so:
cat BLastOutPut | grep –v “PTY” > NewBlastOuput
Try the script with this (NewBlastOuput) instead.
From: Grendel26 [mailto:notifications@github.com] Sent: 15 May 2018 09:11 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)
Hi actually have 4 of them:
g11636.t1_0042_0042 PTY26659.1 40.4 285 160 4 413 693 157 435 1.4e-45 194.1
g11636.t1_0042_0042 PTY10271.1 40.2 286 160 5 413 693 157 436 7.9e-44 188.3
g11636.t1_0042_0042 PTY26663.1 43.1 246 128 5 452 693 17 254 3.3e-42 183.0
g11636.t1_0042_0042 PTY27709.1 39.2 288 165 4 413 696 37 318 9.7e-42 181.4
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-389081874, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqMkO3w1SBRRXqc96Xwd5LLsOlveNks5tyo2ogaJpZM4T8h1B.
The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796
Ok I removed them, It's running I will let you know if it works thank you.
It finally worked ahah, thank you very much :) BTW do you know if we can add a order information as well in the tab?
Im glad it worked, but what do you mean by order? It should already be in best hit order. If you want to sort by gene and bit score:
https://unix.stackexchange.com/questions/52762/trying-to-sort-on-two-fields-second-then-first
From: Grendel26 [mailto:notifications@github.com] Sent: 15 May 2018 13:51 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)
It finally worked ahah, thank you very much :) BTW do you know if we can add a order information as well in the tab?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-389154789, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqIdf4YG0XNY2TUokeR3kZp7Jfk2Oks5tys8kgaJpZM4T8h1B.
The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796
Oh sorry I was talking about the order level for exemple Lepidoptera or Hymenoptera :)
it is possible but another function would have to be coded and be aware of what Orders were which is not as easy as you would think. You could filter your Blast output using this: https://github.com/peterthorpe5/public_scripts/blob/master/blast_output/top_BLAST_hit_filter_out_tax_id.py
To remove anything that was not in a specific order, if you pass it a taxid.
OK thank you for your help and your reactivity 👍
Im running a test on the files you sent. It will take a while
open this up again as a new issue ... I closed it too early yesterday. Sorry about that.
I have rerun your data and have results for the gene which you specified failed for you. Basically, the NR database you BLAST against HAS to match the version of the tax databases you download. – In reality, this is very difficult to achive on a shared server setup. So the script complains about stuff it cant find and puts it in the log file as so:
WARNING: try updating your tax info tax_id database file WARNING: tax_id for XP_023943099 is not found in database
If you email me, I will send you (by reply to the email) with the BLAST output from when I run it: firstname (dot) lastname (AT) hutton (dot) ac (dot) uk
Pete
P.s. I hope you still get these emails after the issue is closed??
From: Grendel26 [mailto:notifications@github.com] Sent: 16 May 2018 10:09 To: peterthorpe5/public_scripts Cc: Peter Thorpe; State change Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)
OK thanks you for your help and your reactivity 👍
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-389450131, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqEUaY2faPUy0e0-949quT4KI6JxJks5ty-yqgaJpZM4T8h1B.
The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796
Hi, I'm actually using your programm but I found some issue such this one:
Here is my script:
Do you know where could be the issue?