peterthorpe5 / public_scripts

collection of bioinformatic scripts
30 stars 23 forks source link

ValueError: not enough values to unpack (expected 4, got 2) #5

Closed StromTroopers closed 6 years ago

StromTroopers commented 6 years ago

Hi, I'm actually using your programm but I found some issue such this one:

INFO: Starting testing: Sat May 12 20:50:59 2018
Traceback (most recent call last):
  File "/pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 1034, in <module>
    logger)
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 465, in parse_diamond_tab
    acc_to_tax_id = assign_taxon_to_dic(acc_taxid_prot)
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 285, in assign_taxon_to_dic
    acc, acc_version, tax_id, GI = line.rstrip("\n").split()
ValueError: not enough values to unpack (expected 4, got 2)

Here is my script:

source /panhome/me//miniconda3/bin/activate
export PYTHONPATH=$PYTHONPATH:/panhome/me/miniconda3/lib/python3.6/site-packages
diamond_tab_output=/pandata/me/blast_database/matches.m8
Diamond_blast_to_taxid=/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py

taxid=/pandata/me/LEPIWASP/blast_database/gi_taxid_prot.dmp

categories=/pandata/me/blast_database/categories.dmp

names=/pandata/me/blast_database/names.dmp

description=/pandata/me/blast_database/acc_to_des.tab

$Diamond_blast_to_taxid -i $diamond_tab_output -t $taxid -c $categories -n $names -d $description -o outfile_sp1.tab

Do you know where could be the issue?

peterthorpe5 commented 6 years ago

Hello,

Can you reply me the "head -n 10 " of each of the input files?

This will help me track down the error. It is failing to parse an input file.

Pete


From: Grendel26 [notifications@github.com] Sent: 12 May 2018 20:11 To: peterthorpe5/public_scripts Cc: Subscribed Subject: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)

Hi, I'm actually using your programm but I found some issue such this one:

INFO: Starting testing: Sat May 12 20:50:59 2018 Traceback (most recent call last): File "/pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 1034, in logger) File "/pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 465, in parse_diamond_tab acc_to_tax_id = assign_taxon_to_dic(acc_taxid_prot) File "/pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 285, in assign_taxon_to_dic acc, acc_version, tax_id, GI = line.rstrip("\n").split() ValueError: not enough values to unpack (expected 4, got 2)

Here is my script:

source /panhome/me//miniconda3/bin/activate export PYTHONPATH=$PYTHONPATH:/panhome/me/miniconda3/lib/python3.6/site-packages diamond_tab_output=/pandata/me/LEPIWASP/blast_database/matches.m8 Diamond_blast_to_taxid=/pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py

taxid=/pandata/me/LEPIWASP/blast_database/gi_taxid_prot.dmp

categories=/pandata/me/LEPIWASP/blast_database/categories.dmp

names=/pandata/me/LEPIWASP/blast_database/names.dmp

description=/pandata/me/LEPIWASP/blast_database/acc_to_des.tab

$Diamond_blast_to_taxid -i $diamond_tab_output -t $taxid -c $categories -n $names -d $description -o outfile_sp1.tab

Do you know where could be the issue?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqEo5PdoVkT58C0NpE_ne_x5XJF7Aks5txzPOgaJpZM4T8h1B.

The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796

StromTroopers commented 6 years ago

For sure:

/blast_database$ head -n 10 categories.dmp
B   7   7
B   9   9
B   11  11
B   14  14
B   17  17
B   19  19
B   21  21
B   23  23
B   24  24
B   25  25
/blast_database$ head -n 10 gi_taxid_prot.dmp
6   9913
8   9913
10  9913
12  9913
14  9913
32  9913
35  9913
42  9913
44  9913
46  9913
/blast_database$ head -n 10 names.dmp        
1   |   all |       |   synonym |
1   |   root    |       |   scientific name |
2   |   Bacteria    |   Bacteria <prokaryotes>  |   scientific name |
2   |   Monera  |   Monera <Bacteria>   |   in-part |
2   |   Procaryotae |   Procaryotae <Bacteria>  |   in-part |
2   |   Prokaryota  |   Prokaryota <Bacteria>   |   in-part |
2   |   Prokaryotae |   Prokaryotae <Bacteria>  |   in-part |
2   |   bacteria    |   bacteria <blast2>   |   blast name  |
2   |   eubacteria  |       |   genbank common name |
2   |   not Bacteria Haeckel 1894   |       |   synonym |
/blast_database$ head -n 10 acc_to_des.tab
WP_003131952.1  Full=30S ribosomal protein S18Q02VU1.1 RecName
XP_642131.1 Full=Calfumirin-1; Short=CAF-1BAA06266.1 calfumirin-1 [Dictyostelium discoideum AX2]EAL68086.1 hypothetical protein DDB_G0277827 [Dictyostelium discoideum AX4]
XP_642837.1 hypothetical protein DDB_G0276911 [Dictyostelium discoideum AX4]EAL68957.1 hypothetical protein DDB_G0276911 [Dictyostelium discoideum AX4]
WP_000184067.1  MbtH family protein [Bacillus]NP_844755.1 hypothetical protein BA_2373 [Bacillus anthracis str. Ames]YP_028470.1 hypothetical protein BAS2209 [Bacillus anthracis str. Sterne]YP_036475.1 balhimycin biosynthetic protein MbtH [[Bacillus thuringiensis] serovar konkukian str. 97-27]AAP26241.1 mbtH-like protein [Bacillus anthracis str. Ames]AAT31492.1 mbtH-like protein [Bacillus anthracis str. 'Ames Ancestor']AAT54521.1 mbtH-like protein [Bacillus anthracis str. Sterne]AAT62162.1 MbtH protein [[Bacillus thuringiensis] serovar konkukian str. 97-27]ABK85418.1 mbtH-like protein [Bacillus thuringiensis str. Al Hakam]EDR19165.1 mbtH-like protein [Bacillus anthracis str. A0488]EDR87721.1 mbtH-like protein [Bacillus anthracis str. A0193]EDR94244.1 mbtH-like protein [Bacillus anthracis str. A0442]EDS97287.1 mbtH-like protein [Bacillus anthracis str. A0389]EDT19705.1 mbtH-like protein [Bacillus anthracis str. A0465]EDT69654.1 mbtH-like protein [Bacillus anthracis str. A0174]EDV17672.1 mbtH-like protein [Bacillus anthracis str. Tsiankovskii-I]EDX57451.1 mbtH-like protein [Bacillus cereus W]EDX64509.1 mbtH-like protein [Bacillus cereus 03BB108]EDX67797.1 MbtH-like protein [Bacillus cereus NVH0597-99]ACK90518.1 mbtH-like protein [Bacillus cereus AH820]ACP13435.1 MbtH-like protein [Bacillus anthracis str. CDC 684]ACQ49836.1 mbtH-like protein [Bacillus anthracis str. A0248]ADK04965.1 MbtH-like protein [Bacillus cereus biovar anthracis str. CI]AEW55483.1 Polymyxin synthetase PmxB [Bacillus cereus F837/76]AFH83638.1 MbtH-like protein [Bacillus anthracis str. H9401]EJQ94779.1 hypothetical protein IGW_02499 [Bacillus cereus ISP3191]EJT19014.1 MbtH-like protein [Bacillus anthracis str. UR-1]EJY93945.1 MbtH-like protein [Bacillus anthracis str. BF1]AHE83774.1 antibiotic transporter [Bacillus anthracis str. A16R]AHE89667.1 antibiotic transporter [Bacillus anthracis str. A16]GAE97797.1 polymyxin synthetase PmxB [Bacillus anthracis CZC5]EVT90821.1 antibiotic transporter [Bacillus anthracis 8903-G]EVT99096.1 antibiotic transporter [Bacillus anthracis 9080-G]EVU05530.1 antibiotic transporter [Bacillus anthracis 52-G]AHK38425.1 MbtH-like protein [Bacillus anthracis str. SVA11]EXJ20374.1 antibiotic transporter [Bacillus anthracis str. 95014]AIF56587.1 antibiotic transporter [Bacillus anthracis]KEY96341.1 antibiotic transporter [Bacillus anthracis str. Carbosap]KFJ82161.1 protein mbtH [Bacillus anthracis]AIK32172.1 protein mbtH [Bacillus anthracis]AIK56953.1 protein mbtH [Bacillus anthracis]AIK65383.1 protein mbtH [Bacillus anthracis str. Vollum]AIK51564.1 protein mbtH [Bacillus anthracis]KFL64197.1 protein mbtH [Bacillus anthracis]KFL68944.1 protein mbtH [Bacillus anthracis]AIM06200.1 mbtH-like protein [Bacillus anthracis]AIM11627.1 mbtH-like protein [Bacillus anthracis]KGZ46499.1 antibiotic transporter [Bacillus anthracis]KGZ52568.1 antibiotic transporter [Bacillus anthracis]KGZ53472.1 antibiotic transporter [Bacillus anthracis]KGZ66622.1 antibiotic transporter [Bacillus anthracis]KGZ68471.1 antibiotic transporter [Bacillus anthracis]KGZ71447.1 antibiotic transporter [Bacillus anthracis]KGZ79538.1 antibiotic transporter [Bacillus anthracis]KGZ85467.1 antibiotic transporter [Bacillus anthracis]KGZ87774.1 antibiotic transporter [Bacillus anthracis]KGZ93918.1 antibiotic transporter [Bacillus anthracis]KGZ95083.1 antibiotic transporter [Bacillus anthracis]KGZ97965.1 antibiotic transporter [Bacillus anthracis]KHA13634.1 antibiotic transporter [Bacillus anthracis]KHA14167.1 antibiotic transporter [Bacillus anthracis]KHA14756.1 antibiotic transporter [Bacillus anthracis]KHA22885.1 antibiotic transporter [Bacillus anthracis]KHA24054.1 antibiotic transporter [Bacillus anthracis]KHA41046.1 antibiotic transporter [Bacillus anthracis]KHA42550.1 antibiotic transporter [Bacillus anthracis]KHG44919.1 antibiotic transporter [Bacillus anthracis]KHG51266.1 antibiotic transporter [Bacillus anthracis]KHG61191.1 antibiotic transporter [Bacillus anthracis]AJA86607.1 antibiotic transporter [Bacillus anthracis]AJF89965.1 antibiotic transporter [Bacillus anthracis]AJG29085.1 antibiotic transporter [Bacillus anthracis]AJG50646.1 protein mbtH [Bacillus anthracis str. Turkey32]AJG59430.1 protein mbtH [Bacillus cereus D17]AJG64878.1 protein mbtH [Bacillus anthracis]AJG68575.1 protein mbtH [Bacillus anthracis]AJG75784.1 protein mbtH [Bacillus thuringiensis]AJG83850.1 protein mbtH [Bacillus anthracis]AJG87422.1 protein mbtH [Bacillus anthracis]AJH27492.1 mbtH-like family protein [Bacillus anthracis]AJH36379.1 mbtH-like family protein [Bacillus anthracis]AJH38448.1 mbtH-like family protein [Bacillus anthracis]AJH46106.1 mbtH-like family protein [Bacillus anthracis str. Sterne]AJH49939.1 mbtH-like family protein [Bacillus anthracis]AJH57940.1 mbtH-like family protein [Bacillus anthracis]AJH63291.1 mbtH-like family protein [Bacillus cereus]AJH98472.1 mbtH-like family protein [Bacillus anthracis str. V770-NP-1R]AJH68770.1 mbtH-like family protein [Bacillus thuringiensis]AJI10795.1 mbtH-like family protein [Bacillus cereus 03BB108]AJH81185.1 mbtH-like family protein [Bacillus thuringiensis]AJH88257.1 mbtH-like family protein [Bacillus anthracis]AJH94559.1 mbtH-like family protein [Bacillus anthracis]AJI32648.1 mbtH-like family protein [Bacillus thuringiensis]AJI37662.1 mbtH-like family protein [Bacillus anthracis]AJK33170.1 mbtH-like family protein [Bacillus cereus]AJM80875.1 antibiotic transporter [Bacillus anthracis]KKM29552.1 antibiotic transporter [Bacillus anthracis]KKM31290.1 antibiotic transporter [Bacillus anthracis]BAR76907.1 protein mbtH [Bacillus anthracis]GAO65037.1 MbtH protein [Bacillus anthracis]GAO59299.1 MbtH protein [Bacillus anthracis]KLA13430.1 hypothetical protein B4087_2330 [Bacillus cereus]KLV16174.1 antibiotic transporter [Bacillus anthracis]KMP73446.1 antibiotic transporter [Bacillus cereus]COF36624.1 Uncharacterized protein conserved in bacteria [Streptococcus pneumoniae]KOM58780.1 antibiotic transporter [Bacillus anthracis]KOM66344.1 antibiotic transporter [Bacillus anthracis]KOM74316.1 antibiotic transporter [Bacillus anthracis]KOM79871.1 antibiotic transporter [Bacillus anthracis]KOM85685.1 antibiotic transporter [Bacillus anthracis]KOM93139.1 antibiotic transporter [Bacillus anthracis]KON02851.1 antibiotic transporter [Bacillus anthracis]KON19765.1 antibiotic transporter [Bacillus anthracis]KON23345.1 antibiotic transporter [Bacillus anthracis]KOR56487.1 antibiotic transporter [Bacillus anthracis]KOR64746.1 antibiotic transporter [Bacillus anthracis]CUB40593.1 MbtH-like protein [Bacillus cereus]CUB50981.1 MbtH-like protein [Bacillus subtilis]ALC34486.1 antibiotic transporter [Bacillus anthracis]KWU56222.1 antibiotic transporter [Bacillus cereus]AMC04350.1 antibiotic transporter [Bacillus anthracis]KXX86462.1 antibiotic transporter [Bacillus cereus]KXY63330.1 antibiotic transporter [Bacillus cereus]KXY85405.1 antibiotic transporter [Bacillus cereus]KYZ64779.1 antibiotic transporter [Bacillus sp. GZT]ANH86541.1 antibiotic transporter [Bacillus anthracis]OBV06872.1 antibiotic transporter [Bacillus anthracis]OBV08044.1 antibiotic transporter [Bacillus anthracis]ANR04840.1 antibiotic transporter [Bacillus anthracis]ANR10137.1 antibiotic transporter [Bacillus anthracis]ANR15436.1 antibiotic transporter [Bacillus anthracis]ANR20737.1 antibiotic transporter [Bacillus anthracis]ANR26037.1 antibiotic transporter [Bacillus anthracis]ANR31338.1 antibiotic transporter [Bacillus anthracis]ANR36642.1 antibiotic transporter [Bacillus anthracis]ANR41938.1 antibiotic transporter [Bacillus anthracis]ANR47229.1 antibiotic transporter [Bacillus anthracis]ANR52527.1 antibiotic transporter [Bacillus anthracis]ANR57822.1 antibiotic transporter [Bacillus anthracis]OHO06076.1 antibiotic transporter [Bacillus anthracis]OJD92396.1 antibiotic transporter [Bacillus anthracis]OKA49635.1 antibiotic transporter [Bacillus anthracis]APT25844.1 antibiotic transporter [Bacillus anthracis]AQM46304.1 antibiotic transporter [Bacillus anthracis]OON46530.1 antibiotic transporter [Bacillus anthracis]OOX82618.1 antibiotic transporter [Bacillus anthracis]OOZ95984.1 antibiotic transporter [Bacillus cereus]OPA04001.1 antibiotic transporter [Bacillus cereus]OPD55927.1 antibiotic transporter [Bacillus anthracis]OPE63914.1 antibiotic transporter [Bacillus anthracis]OPE65888.1 antibiotic transporter [Bacillus anthracis]OPE77433.1 antibiotic transporter [Bacillus anthracis]OPE84282.1 antibiotic transporter [Bacillus anthracis]OPE86441.1 antibiotic transporter [Bacillus anthracis]OPE91785.1 antibiotic transporter [Bacillus anthracis]OPE97551.1 antibiotic transporter [Bacillus anthracis]OPF04166.1 antibiotic transporter [Bacillus anthracis]OPF12941.1 antibiotic transporter [Bacillus anthracis]OTW54644.1 antibiotic transporter [Bacillus thuringiensis serovar mexicanensis]OTW95868.1 antibiotic transporter [Bacillus thuringiensis serovar monterrey]OTX36822.1 antibiotic transporter [Bacillus thuringiensis serovar brasilensis]OTX46645.1 antibiotic transporter [Bacillus thuringiensis serovar pondicheriensis]OTY78694.1 antibiotic transporter [Bacillus thuringiensis serovar vazensis]OUA97881.1 antibiotic transporter [Bacillus thuringiensis serovar oswaldocruzi]SME21851.1 MbtH-like protein [Bacillus cereus]ARZ62528.1 antibiotic transporter [Bacillus thuringiensis]ASE32524.1 MbtH family protein [Bacillus anthracis]OXM02662.1 antibiotic transporter [Bacillus anthracis]PDP00971.1 MbtH family protein [Bacillus anthracis]PDP05901.1 MbtH family protein [Bacillus anthracis]PDP09673.1 MbtH family protein [Bacillus anthracis]PDP14615.1 MbtH family protein [Bacillus anthracis]PDP21847.1 MbtH family protein [Bacillus anthracis]PDP26476.1 MbtH family protein [Bacillus anthracis]PDP33868.1 MbtH family protein [Bacillus anthracis]PGB53585.1 MbtH family protein [Bacillus anthracis]AUD25801.1 MbtH family protein [Bacillus sp. HBCD-sjtu]PMU02308.1 MbtH family protein [Bacillus sp. UAEU-H3K6M1]PNS47938.1 MbtH family protein [Bacillus anthracis]PNS54332.1 MbtH family protein [Bacillus anthracis]PNS59832.1 MbtH family protein [Bacillus anthracis]PNS65202.1 MbtH family protein [Bacillus anthracis]PNS73917.1 MbtH family protein [Bacillus anthracis]PNS77544.1 MbtH family protein [Bacillus anthracis]PNS83210.1 MbtH family protein [Bacillus anthracis]PRD00625.1 MbtH family protein [Bacillus cereus]PRD06309.1 MbtH family protein [Bacillus cereus]PRD59520.1 MbtH family protein [Bacillus anthracis]PTR53886.1 MbtH family protein [Bacillus anthracis]PTR59175.1 MbtH family protein [Bacillus anthracis]PTR66756.1 MbtH family protein [Bacillus anthracis]PTR74157.1 MbtH family protein [Bacillus anthracis]PTR75459.1 MbtH family protein [Bacillus anthracis]PTR80521.1 MbtH family protein [Bacillus anthracis]PTR85276.1 MbtH family protein [Bacillus anthracis]PTR91761.1 MbtH family protein [Bacillus anthracis]
WP_007051162.1  argininosuccinate lyase [Bifidobacterium]NP_696229.1 argininosuccinate lyase [Bifidobacterium longum NCC2705]Q8G5F3.1 RecName
WP_000135199.1  30S ribosomal protein S18 [Bacteria]NP_313205.1 30S ribosomal protein S18 [Escherichia coli O157:H7 str. Sakai]NP_418623.1 30S ribosomal subunit protein S18 [Escherichia coli str. K-12 substr. MG1655]NP_458827.1 30s ribosomal subunit protein S18 [Salmonella enterica subsp. enterica serovar Typhi str. CT18]NP_710065.1 30S ribosomal protein S18 [Shigella flexneri 2a str. 301]YP_405749.1 30S ribosomal protein S18 [Shigella dysenteriae Sd197]YP_002410527.1 30S ribosomal protein S18 [Escherichia coli IAI39]YP_002415332.1 30S ribosomal subunit protein S18 [Escherichia coli UMN026]YP_003611111.1 30S ribosomal protein S18 [Enterobacter cloacae subsp. cloacae ATCC 13047]YP_004592087.1 30S ribosomal protein S18 [Klebsiella aerogenes KCTC 2190]YP_005224742.1 30S ribosomal protein S18 [Klebsiella pneumoniae subsp. pneumoniae HS11286]YP_006122613.1 30S ribosomal protein S18 [Escherichia coli O83:H1 str. NRG 857C]YP_006781181.1 30S ribosomal protein S18 [Escherichia coli O104:H4 str. 2011C-3493]NP_463254.2 30S ribosomal protein S18 [Salmonella enterica subsp. enterica serovar Typhimurium str. LT2]P0A7T7.2 RecName
WP_003251213.1  leucyl/phenylalanyl-tRNA--protein transferase [Pseudomonas]NP_746135.1 leucyl/phenylalanyl-tRNA--protein transferase [Pseudomonas putida KT2440]Q88FS7.1 RecName
WP_003409891.1  SecB-like chaperone [Mycobacterium]NP_216473.1 SecB-like chaperone [Mycobacterium tuberculosis H37Rv]YP_009359329.1 HYPOTHETICAL PROTEIN BQ2027_MB1992 [Mycobacterium bovis AF2122/97]P95257.1 RecName
WP_000379821.1  O-acetyltransferase OatA [Staphylococcus]YP_501338.1 hypothetical protein SAOUHSC_02885 [Staphylococcus aureus subsp. aureus NCTC 8325]Q5HCY3.1 RecName
WP_000332037.1  ribonucleoside-diphosphate reductase 1 subunit beta [Proteobacteria]NP_311145.1 ribonucleotide-diphosphate reductase subunit beta [Escherichia coli O157:H7 str. Sakai]NP_416738.1 ribonucleoside-diphosphate reductase 1, beta subunit, ferritin-like protein [Escherichia coli str. K-12 substr. MG1655]YP_403993.1 ribonucleotide-diphosphate reductase subunit beta [Shigella dysenteriae Sd197]YP_002413284.1 ribonucleoside-diphosphate reductase 1 subunit beta [Escherichia coli UMN026]P69924.2 RecName
/blast_database$ head -n 10 matches.m8    
g13600.t1_0042_0042 AAQ57129.1  74.2    341 87  1   1   341 549 888 5.0e-145    523.5
g13600.t1_0042_0042 XP_013161242.1  75.0    300 74  1   1   300 561 859 2.6e-125    458.0
g13600.t1_0042_0042 CAX36787.1  74.4    301 76  1   10  310 1   300 9.8e-125    456.1
g13600.t1_0042_0042 KLV34197.1  53.4    341 156 2   1   340 296 634 1.6e-98 369.0
g13600.t1_0042_0042 XP_014358783.1  53.5    340 154 2   1   340 554 889 2.5e-96 361.7
g13600.t1_0042_0042 XP_013173125.1  76.9    212 48  1   1   212 485 695 3.3e-88 334.7
g13600.t1_0042_0042 XP_013168065.1  72.5    222 60  1   1   222 611 831 3.4e-85 324.7
g13600.t1_0042_0042 XP_014357403.1  70.0    220 64  2   38  257 865 1082    1.4e-78 302.8
g13600.t1_0042_0042 XP_014356712.1  69.5    197 59  1   134 330 1   196 3.1e-70 275.0
g13600.t1_0042_0042 XP_013163722.1  78.9    152 31  1   1   152 1   151 2.2e-63 252.3
StromTroopers commented 6 years ago

Do you have an idea where is the issue?

peterthorpe5 commented 6 years ago

yes: So it failes in this function - assign_taxon_to_dic(acc_taxid_prot) This needs the prot.accession2taxid file downloaded from NCBI. The file is formatted as so (Exactly as it is dowloaded, but decompressed): acc acc_version tax_id GI XP_642131 XP_642131.1 352472 66816243

You gave it the "gi_taxid_prot.dmp " it needs the prot.accession2taxid. If you really want to do it with that file I think the old script in the legacy folder in this tool works that way ... You can download it using wget below, then gunzip.

wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz.md5 wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz md5sum -c prot.accession2taxid.gz.md5

gunzip prot.accession2taxid.gz

then alter your -t command, and it should work.

cheers,

Pete

p.s. this newer version is quite RAM hungry and may need more like 60GB.

StromTroopers commented 6 years ago

Ok, it's seems it's running thank you :). How long do you think will the process take with 8 ppn?

peterthorpe5 commented 6 years ago

It takes a while to load the prot to acc into RAM, then the whole thing should run in under 3 hours. At a guess.

Im glad it is running for you.

Cheers,

Pete

From: Grendel26 [mailto:notifications@github.com] Sent: 14 May 2018 09:43 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)

Ok, it's seems it's running thank you :). How long do you think will the process take with 8 ppn?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-388741465, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqDyVq2J52qa8s4s32Ld8CobBaf4-ks5tyUOdgaJpZM4T8h1B.

The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796

StromTroopers commented 6 years ago

Finnaly I got another issue here is the error message:


INFO: Starting testing: Mon May 14 10:49:42 2018
INFO: loaded gi to description database
INFO: Annotating tax id info to tab file
Traceback (most recent call last):
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 1034, in <module>
    logger)
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 483, in parse_diamo
nd_tab
    if not parse_blast_line(line, logger):
  File "/pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 312, in parse_blast
_line
    accession, line = get_accession_number(line, logger)
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_blast_to_taxid.py", line 339, in get_accessi
on_number
    acc = acces_column.split("|")[1]
IndexError: list index out of range

Do you know what is the issue?

Here is the head of the prot.accession2taxid file: 
accession   accession.version   taxid   gi
P26567  P26567.2    4577    1168978
P12208  P12208.1    3197    116525
P12210  P12210.1    4097    116527
P24064  P24064.2    4565    17374148
P22260  P22260.2    190485  21903391
P17697  P17697.1    9913    116530
P25473  P25473.1    9615    116531
P14018  P14018.2    93934   1705937
P10909  P10909.1    9606    116533
peterthorpe5 commented 6 years ago

yes, OK. I think your BLAST output is in a different format to mine. Give me a few mins. I will alter the script on github.

StromTroopers commented 6 years ago

I made it with diamond by running: $diamond blastp -d $nr -q $candidates_aa_0035 -o matches_0035.m8

peterthorpe5 commented 6 years ago

Theres nothing wrong with your Diamond run. Our BLAST NR database lags behind the current version, so the format was old for when I was writing this - I think.

5 more mins.

peterthorpe5 commented 6 years ago

ok try this: Diamond_blast_to_taxid.py.

StromTroopers commented 6 years ago

Ok it is running, I will let you know if it works thank you.

StromTroopers commented 6 years ago

OK now I get this issue :

INFO: sys.version_info(major=3, minor=6, micro=5, releaselevel='final', serial=0)
INFO: Command-line: /pandata/me/LEPIWASP/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info
/Diamond_blast_to_taxid2.py -i /pandata/me/blast_database/matches.m8 -t /pandata/me/LEPIWASP/bla
st_database/prot.accession2taxid -c /pandata/me/blast_database/categories.dmp -n /pandata/me/blast_database/names.dmp -d /pandata/me/blast_database/acc_to_des.tab -o outfile_sp1.tab
INFO: Starting testing: Mon May 14 12:16:43 2018
INFO: loaded gi to description database
INFO: Annotating tax id info to tab file
Traceback (most recent call last):
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 1004, in <module>
    logger)
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 485, in parse_diamond_tab
    if not parse_blast_line(line, logger):
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 311, in parse_blast_line
    accession, line = get_accession_number(line, logger)
ValueError: too many values to unpack (expected 2)
peterthorpe5 commented 6 years ago

this is now updated.

StromTroopers commented 6 years ago

Ok, it runs :) btw I should get the output tab file in my working directory right?

peterthorpe5 commented 6 years ago

You will get an output tab, like the one you put it but with more columns (descriptions, kingdom, tax_id etc …).

If you have matplotlib installed you will get a graph of the percentage identify of your top blast hits vs the blast database.

You will get the kingdom and Genus breakdown of your top blast hits. You will also get a file with your top blast hits, as a tab file.

The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796

StromTroopers commented 6 years ago

Here is the next issue i got :


INFO: Command-line: /pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info
/Diamond_blast_to_taxid2.py -i /pandata/me/blast_database/matches.m8 -t /pandata/bla
st_database/prot.accession2taxid -c /pandata/me/blast_database/categories.dmp -n /pandata/me/blast_database/names.dmp -d /pandata/me/blast_database/acc_to_des.tab -o outfile_sp1.tab
INFO: Starting testing: Mon May 14 13:04:36 2018
INFO: loaded gi to description database
INFO: Annotating tax id info to tab file
Traceback (most recent call last):
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 497, in parse_diamond_tab
    tax_id = acc_to_tax_id[accession]
KeyError: 'PTY26659'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 1006, in <module>
    logger)
  File "/pandata/me/blast_database/public_scripts-master/Diamond_BLAST_add_taxonomic_info/Diamond_bla
st_to_taxid2.py", line 500, in parse_diamond_tab
    if acc_to_tax_id.has_key(accession.rstrip()):
AttributeError: 'dict' object has no attribute 'has_key'
peterthorpe5 commented 6 years ago

PTY26659 is not an accession. How has this got into your tab file? grep "PTY26659" prot.accession2taxid - yeilds nothing for me.

$ grep "PTY" prot.accession2taxid Q9PTY0 Q9PTY0.1 7962 47605558 Q9PTY5 Q9PTY5.1 8355 82117647 A0PTY6 A0PTY6.1 362242 166223954 A0PTY0 A0PTY0.1 362242 166991492

Maybe try

$cat "yourBlastOutput" | grep -v "PTY26659 " > newBlastOutput

StromTroopers commented 6 years ago

It's weird, when I do grep "PTY26659" prot.accession2taxid, I get nothing neither. I did cat "matches.m8" | grep -v "PTY26659 " > matches2.m8 I'll let you know if it works :)

StromTroopers commented 6 years ago

OK I have another issue with the same type but with: KeyError: 'PTY10271' Do you think it is because I used diamond?

peterthorpe5 commented 6 years ago

I use Diamond all the time. What did you BLAST your sequences against?

From: Grendel26 [mailto:notifications@github.com] Sent: 14 May 2018 14:36 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)

OK I have another issue with the same type but with: KeyError: 'PTY10271' Do you think it is because I used diamond?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-388818508, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqBihwLjgzSpVjchBvBGSqbPzo7Poks5tyYglgaJpZM4T8h1B.

The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796

StromTroopers commented 6 years ago

Hi did all the process one more time:

First: I made the diamond database by using the nr.faa downloaded from the ncbi website, I got from this a nr.dmndfile Second: I made a diamond blastp by using the nr.dmnd file against my protein fasta file, I got a matches_0042.m8 file. Finnaly : I'm running your script with the .m8 file but I still have this key error and when I do: grep "PTY26659" matches_0042.m8 I indeed have this accesion in my .m8 file: g11636.t1_0042_0042 PTY26659.1 40.4 285 160 4 413 693 157 435 1.4e-45 194.1

Do you know where is the issue ?

peterthorpe5 commented 6 years ago

Weird! If you look for that on NCBI, you can't find it. I can code the script to skip it if you want? As this doesn't exist in the accession file, which you have seen yourself...


From: Grendel26 [notifications@github.com] Sent: 14 May 2018 20:53 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)

Hi did all the process one more time:

First: I made the diamond database by using the nr.faa downloaded from the ncbi website, I got from this a nr.dmndfile Second: I made a diamond blastp by using the nr.dmnd file against my protein fasta file, I got a matches_0042.m8 file. Finnaly : I'm running your script with the .m8 file but I still have this key error and when I do: grep "PTY26659" matches_0042.m8 I indeed have this accesion in my .m8 file: g11636.t1_0042_0042 PTY26659.1 40.4 285 160 4 413 693 157 435 1.4e-45 194.1

Do you know where is the issue ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-388941250, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqJc3ttLbeacFe7PvI4xNJ25Om1Bpks5tyeCkgaJpZM4T8h1B.

The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796

StromTroopers commented 6 years ago

Maybe I did something wrong, I mean I did not do blastdbcmd -entry 'all' -db nr > nr.faabut I downloaded the nr file direclty from here ftp://ftp.ncbi.nlm.nih.gov/blast/db/

the error comes from the nr databse right?

The issue cannot comes from the following commande is not it? python prepare_accession_to_description_db.py -i nr.faa (default)-o acc_to_des.tab (dafault)

If I well understood, if we skip it, I'll loose the sequences with these accession number?

peterthorpe5 commented 6 years ago

Do: grep “PTY” BlastOutPut > weirdThings.txt

I want to see how many of these weird things there are.

If we skip them, they will not have this specific Blast hit, but may very well have others!

From: Grendel26 [mailto:notifications@github.com] Sent: 14 May 2018 21:06 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)

Maybe I did something wrong, I mean I did not do blastdbcmd -entry 'all' -db nr > nr.faabut I downloaded the nr file direclty from here ftp://ftp.ncbi.nlm.nih.gov/blast/db/

The issue cannot comes here is not it? python prepare_accession_to_description_db.py -i nr.faa (default)-o acc_to_des.tab (dafault)

If I well understood, if we skip it, I'll loose the sequences with these accession number?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-388945185, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqJ2LBeWntDua39dmrt0j2wCbRFKwks5tyePAgaJpZM4T8h1B.

The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796

StromTroopers commented 6 years ago

Hi actually have 4 of them:

g11636.t1_0042_0042 PTY26659.1  40.4    285 160 4   413 693 157 435 1.4e-45 194.1
g11636.t1_0042_0042 PTY10271.1  40.2    286 160 5   413 693 157 436 7.9e-44 188.3
g11636.t1_0042_0042 PTY26663.1  43.1    246 128 5   452 693 17  254 3.3e-42 183.0
g11636.t1_0042_0042 PTY27709.1  39.2    288 165 4   413 696 37  318 9.7e-42 181.4
peterthorpe5 commented 6 years ago

All from the same gene.

grep “g11636.t1_0042_0042” and see if you have other decent hits, which could represent this gene. If so:

cat BLastOutPut | grep –v “PTY” > NewBlastOuput

Try the script with this (NewBlastOuput) instead.

From: Grendel26 [mailto:notifications@github.com] Sent: 15 May 2018 09:11 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)

Hi actually have 4 of them:

g11636.t1_0042_0042 PTY26659.1 40.4 285 160 4 413 693 157 435 1.4e-45 194.1

g11636.t1_0042_0042 PTY10271.1 40.2 286 160 5 413 693 157 436 7.9e-44 188.3

g11636.t1_0042_0042 PTY26663.1 43.1 246 128 5 452 693 17 254 3.3e-42 183.0

g11636.t1_0042_0042 PTY27709.1 39.2 288 165 4 413 696 37 318 9.7e-42 181.4

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-389081874, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqMkO3w1SBRRXqc96Xwd5LLsOlveNks5tyo2ogaJpZM4T8h1B.

The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796

StromTroopers commented 6 years ago

Ok I removed them, It's running I will let you know if it works thank you.

StromTroopers commented 6 years ago

It finally worked ahah, thank you very much :) BTW do you know if we can add a order information as well in the tab?

peterthorpe5 commented 6 years ago

Im glad it worked, but what do you mean by order? It should already be in best hit order. If you want to sort by gene and bit score:

https://unix.stackexchange.com/questions/52762/trying-to-sort-on-two-fields-second-then-first

From: Grendel26 [mailto:notifications@github.com] Sent: 15 May 2018 13:51 To: peterthorpe5/public_scripts Cc: Peter Thorpe; Comment Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)

It finally worked ahah, thank you very much :) BTW do you know if we can add a order information as well in the tab?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-389154789, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqIdf4YG0XNY2TUokeR3kZp7Jfk2Oks5tys8kgaJpZM4T8h1B.

The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796

StromTroopers commented 6 years ago

Oh sorry I was talking about the order level for exemple Lepidoptera or Hymenoptera :)

peterthorpe5 commented 6 years ago

it is possible but another function would have to be coded and be aware of what Orders were which is not as easy as you would think. You could filter your Blast output using this: https://github.com/peterthorpe5/public_scripts/blob/master/blast_output/top_BLAST_hit_filter_out_tax_id.py

To remove anything that was not in a specific order, if you pass it a taxid.

StromTroopers commented 6 years ago

OK thank you for your help and your reactivity 👍

peterthorpe5 commented 6 years ago

Im running a test on the files you sent. It will take a while

peterthorpe5 commented 6 years ago

open this up again as a new issue ... I closed it too early yesterday. Sorry about that.

peterthorpe5 commented 6 years ago

I have rerun your data and have results for the gene which you specified failed for you. Basically, the NR database you BLAST against HAS to match the version of the tax databases you download. – In reality, this is very difficult to achive on a shared server setup. So the script complains about stuff it cant find and puts it in the log file as so:

WARNING: try updating your tax info tax_id database file WARNING: tax_id for XP_023943099 is not found in database

If you email me, I will send you (by reply to the email) with the BLAST output from when I run it: firstname (dot) lastname (AT) hutton (dot) ac (dot) uk

Pete

P.s. I hope you still get these emails after the issue is closed??

From: Grendel26 [mailto:notifications@github.com] Sent: 16 May 2018 10:09 To: peterthorpe5/public_scripts Cc: Peter Thorpe; State change Subject: Re: [peterthorpe5/public_scripts] ValueError: not enough values to unpack (expected 4, got 2) (#5)

OK thanks you for your help and your reactivity 👍

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/peterthorpe5/public_scripts/issues/5#issuecomment-389450131, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJhCqEUaY2faPUy0e0-949quT4KI6JxJks5ty-yqgaJpZM4T8h1B.

The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796