nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
163 stars 107 forks source link

multi-region analysis: sidle/reconstructed/reconstructed_merged.tsv OCCATIONALLY mis-formatted #736

Closed d4straub closed 1 month ago

d4straub commented 2 months ago

Description of the bug

sidle/reconstructed/reconstructed_merged.tsv was wrongly formatted (note that the first line contains the abundance in the last 3 columns, while it is missing for the second and third line):

"ID"    "Taxon" "sa"    "sb"    "sc"
"AB361591.1.1439|AY486367.1.1434|DQ140184.1.1403|DQ302158.1.1386|EF509324.1.1494|EF509367.1.1462|EF509378.1.1458|EF509460.1.1463|EF509596.1.1469|EF509605.1.1477|EF510026.1.1464|EF510037.1.1401|EF510941.1.1400|EF511012.1.1465|EU139850.1.1411|EU661692.1.1480|EU874609.1.1395|FJ940905.1.1467|GU122959.1.1401|GU181421.1.1378|HM480353.1.1479|HQ232955.1.1433|HQ455027.1.1444|HQ455028.1.1424|HQ880674.1.1424|JF723552.1.1398|JN846903.1.1315|JQ900535.1.1456|JQ900537.1.1448|KC862289.1.1429|KR811027.1.1488|KU196753.1.1447|KU352734.1.1377|LLQC01000080.229.1641|LN558607.1.1223" "D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas;D_6__Pseudomonas aeruginosa|D_6__Pseudomonas sp. 38(2011)|D_6__Pseudomonas sp. B7|D_6__Pseudomonas sp. BS-161R|D_6__Pseudomonas sp. DBTC4|D_6__Pseudomonas sp. DBTSML|D_6__Pseudomonas sp. LJLP1-15|D_6__Pseudomonas sp. PYD-4|D_6__Pseudomonas sp. VITDM1" 0   0   616
"AB508839.1.1527|AB813716.1.1291|AY030329.1.1502|EF422864.1.1474|EU363702.1.1320|EU366382.1.1482|EU586319.1.1450|EU679368.1.1464|EU780733.1.1447|FJ157236.1.1449|FJ769135.1.1435|FJ789808.1.1250|FJ863109.1.1429|FN556453.1.1454|GQ199587.1.1262|GQ280077.1.1432|GQ280079.1.1417|GQ280082.1.1432|GQ301542.1.1528|GU121487.1.1396|GU121494.1.1359|GU122948.1.1451|GU366049.2.1213|HM150646.1.1434|HM588147.1.1452|HQ021420.1.1577|HQ436036.1.1463|HQ731028.1.1462|HQ731029.1.1457|HQ834863.1.1471|HQ844504.1.1456|JF708240.1.1478|JQ424889.1.1463|JX156418.1.1500|KC310835.1.1447|KC405250.1.1449|KC683890.1.1387|KC855545.1.1458|KC855547.1.1459|KF254579.1.1446|KF482851.1.1452|KF574386.1.1433|KF844068.1.1524|KF860141.1.1455|KF917163.1.1326|KF917168.1.1524|KF928702.1.1454|KF928703.1.1455|KJ162241.1.1402|KJ743290.1.1518|KJ752760.1.1525|KP743130.1.1409|KP851955.1.1453|KP877505.1.1455|KR131622.1.1367|KT149667.1.1347|KT200495.1.1261|KT247502.1.1441|KT250765.1.1412|KT266579.1.1506|KT722838.1.1501|KU157226.1.1478|KU230011.1.1202|KX082973.1.1412|KX262911.1.1501|KX262912.1.1566"   "D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;D_3__Bacillales;D_4__Bacillaceae;D_5__Bacillus;Ambiguous_taxa|D_6__Bacillus amyloliquefaciens|D_6__Bacillus mojavensis|D_6__Bacillus sp. 1.143|D_6__Bacillus sp. 12-82|D_6__Bacillus sp. BAB-3438|D_6__Bacillus sp. BAB-4129|D_6__Bacillus sp. BJC2.1|D_6__Bacillus sp. CM4(2015)|D_6__Bacillus sp. CZB26|D_6__Bacillus sp. HYC-1-3|D_6__Bacillus sp. LX-119|D_6__Bacillus sp. LX-120|D_6__Bacillus sp. RPT0001|D_6__Bacillus sp. TT1|D_6__Bacillus sp. Ti28|D_6__Bacillus sp. YBN13|D_6__Bacillus sp. YM2|D_6__Bacillus sp. sadinb1|D_6__Bacillus subtilis|D_6__Bacillus subtilis subsp. subtilis|D_6__Bacillus tequilensis|D_6__Bacillus vallismortis|D_6__Bacillus velezensis|D_6__Geobacillus sp. RSNPB7|D_6__Paenibacillus sp. BAB-3433|D_6__bacterium ARb05|D_6__bacterium B1-6-2|D_6__bacterium enrichment culture clone 16(2011)|D_6__bacterium enrichment culture clone 79(2011)
AB523727.1.1479 D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Enterobacteriales;D_4__Enterobacteriaceae;D_5__Enterobacter;D_6__Enterobacteriaceae bacterium NES11
AB548850.1.1254|FJ901047.1.1308 D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas;Ambiguous_taxa

This seems to stem from a mis-formatted reconstructed_taxonomy.tsv (line 3 starts but doesnt end in " and from line four on there isnt any " anymore)

"ID"    "Taxon"
"AB361591.1.1439|AY486367.1.1434|DQ140184.1.1403|DQ302158.1.1386|EF509324.1.1494|EF509367.1.1462|EF509378.1.1458|EF509460.1.1463|EF509596.1.1469|EF509605.1.1477|EF510026.1.1464|EF510037.1.1401|EF510941.1.1400|EF511012.1.1465|EU139850.1.1411|EU661692.1.1480|EU874609.1.1395|FJ940905.1.1467|GU122959.1.1401|GU181421.1.1378|HM480353.1.1479|HQ232955.1.1433|HQ455027.1.1444|HQ455028.1.1424|HQ880674.1.1424|JF723552.1.1398|JN846903.1.1315|JQ900535.1.1456|JQ900537.1.1448|KC862289.1.1429|KR811027.1.1488|KU196753.1.1447|KU352734.1.1377|LLQC01000080.229.1641|LN558607.1.1223" "D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas;D_6__Pseudomonas aeruginosa|D_6__Pseudomonas sp. 38(2011)|D_6__Pseudomonas sp. B7|D_6__Pseudomonas sp. BS-161R|D_6__Pseudomonas sp. DBTC4|D_6__Pseudomonas sp. DBTSML|D_6__Pseudomonas sp. LJLP1-15|D_6__Pseudomonas sp. PYD-4|D_6__Pseudomonas sp. VITDM1"
"AB508839.1.1527|AB813716.1.1291|AY030329.1.1502|EF422864.1.1474|EU363702.1.1320|EU366382.1.1482|EU586319.1.1450|EU679368.1.1464|EU780733.1.1447|FJ157236.1.1449|FJ769135.1.1435|FJ789808.1.1250|FJ863109.1.1429|FN556453.1.1454|GQ199587.1.1262|GQ280077.1.1432|GQ280079.1.1417|GQ280082.1.1432|GQ301542.1.1528|GU121487.1.1396|GU121494.1.1359|GU122948.1.1451|GU366049.2.1213|HM150646.1.1434|HM588147.1.1452|HQ021420.1.1577|HQ436036.1.1463|HQ731028.1.1462|HQ731029.1.1457|HQ834863.1.1471|HQ844504.1.1456|JF708240.1.1478|JQ424889.1.1463|JX156418.1.1500|KC310835.1.1447|KC405250.1.1449|KC683890.1.1387|KC855545.1.1458|KC855547.1.1459|KF254579.1.1446|KF482851.1.1452|KF574386.1.1433|KF844068.1.1524|KF860141.1.1455|KF917163.1.1326|KF917168.1.1524|KF928702.1.1454|KF928703.1.1455|KJ162241.1.1402|KJ743290.1.1518|KJ752760.1.1525|KP743130.1.1409|KP851955.1.1453|KP877505.1.1455|KR131622.1.1367|KT149667.1.1347|KT200495.1.1261|KT247502.1.1441|KT250765.1.1412|KT266579.1.1506|KT722838.1.1501|KU157226.1.1478|KU230011.1.1202|KX082973.1.1412|KX262911.1.1501|KX262912.1.1566"   "D_0__Bacteria;D_1__Firmicutes;D_2__Bacilli;D_3__Bacillales;D_4__Bacillaceae;D_5__Bacillus;Ambiguous_taxa|D_6__Bacillus amyloliquefaciens|D_6__Bacillus mojavensis|D_6__Bacillus sp. 1.143|D_6__Bacillus sp. 12-82|D_6__Bacillus sp. BAB-3438|D_6__Bacillus sp. BAB-4129|D_6__Bacillus sp. BJC2.1|D_6__Bacillus sp. CM4(2015)|D_6__Bacillus sp. CZB26|D_6__Bacillus sp. HYC-1-3|D_6__Bacillus sp. LX-119|D_6__Bacillus sp. LX-120|D_6__Bacillus sp. RPT0001|D_6__Bacillus sp. TT1|D_6__Bacillus sp. Ti28|D_6__Bacillus sp. YBN13|D_6__Bacillus sp. YM2|D_6__Bacillus sp. sadinb1|D_6__Bacillus subtilis|D_6__Bacillus subtilis subsp. subtilis|D_6__Bacillus tequilensis|D_6__Bacillus vallismortis|D_6__Bacillus velezensis|D_6__Geobacillus sp. RSNPB7|D_6__Paenibacillus sp. BAB-3433|D_6__bacterium ARb05|D_6__bacterium B1-6-2|D_6__bacterium enrichment culture clone 16(2011)|D_6__bacterium enrichment culture clone 79(2011)
AB523727.1.1479 D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Enterobacteriales;D_4__Enterobacteriaceae;D_5__Enterobacter;D_6__Enterobacteriaceae bacterium NES11
AB548850.1.1254|FJ901047.1.1308 D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae;D_5__Pseudomonas;Ambiguous_taxa

QIIME2 barplot files looked fine though, so downstream it seems fine again.

Command used and terminal output

#that was the command that had the mis-formatted file
NXF_VER=23.10.1 nextflow run nf-core/ampliseq -r 2.9.0 -profile cfc --input illumina_multiregion_V1V2-V3V4-V6V8_samplesheet.tsv --multiregion illumina_multiregion_V1V2-V3V4-V6V8_multiregion.tsv --metadata illumina_multiregion_metadata.tsv --sidle_ref_taxonomy "silva=128" --skip_dada_taxonomy --skip_ancom --outdir ampliseq_illumina_multiregion_V1V2-V3V4-V6V8 -resume

#with this command I could not find any trouble:
NXF_VER=23.10.1 nextflow run nf-core/ampliseq -r 2.9.0 -profile cfc --input illumina_multiregion_V1V3-V4V5-V7V9_samplesheet.tsv --multiregion illumina_multiregion_V1V3-V4V5-V7V9_multiregion.tsv --metadata illumina_multiregion_metadata.tsv --sidle_ref_taxonomy "silva=128" --skip_dada_taxonomy --skip_ancom --outdir ampliseq_illumina_multiregion_V1V3-V4V5-V7V9 -resume

Relevant files

No response

System information

No response

d4straub commented 1 month ago

It seems like this is caused by un-common signs in the taxonomies. So its rather a file reading issue. The pipeline seems fine with it though.