qiyunzhu / woltka

Woltka: a versatile meta'omic data classifier
BSD 3-Clause "New" or "Revised" License
68 stars 24 forks source link

TypeError: '<' not supported between instances of 'NoneType' and 'str' #34

Open adswafford opened 4 years ago

adswafford commented 4 years ago

New error when running taxonomy

Command: cd $tmp

in_dir=/projects/cmi_proj/blood_microbiome/niaid/combined out_root=$in_dir/shogun align_dir=$out_root/wol_alignments taxonomy=/projects/cmi_proj/blood_microbiome/three_studies/taxonomy function=/projects/wol/20170307/release/annotation

do gotus

echo 'starting woltk' conda activate woltk
echo 'woltk function for niaid'

make a directory for the output

out_dir=$out_root/woltka mkdir -p $out_dir

map_dir=$out_dir/mapdir func_dir=$out_dir/taxfunc

if [ ! -f $out_dir/niaid.woltk.fin ] then $(which time) woltka classify \ -i $align_dir \ --map $taxonomy/g2tid.txt \ --nodes $taxonomy/nodes.dmp \ --names $taxonomy/names.dmp \ --rank phylum,genus,species,free,none \ --name-as-id \ --outmap $map_dir \ -o $out_dir/taxonomy/ > $out_dir/output_taxonomy.log

Output log (woltk) [adswafford@barnacle.ucsd.edu /projects/cmi_proj/blood_microbiome/niaid 08:07 AM]$ cat combined/shogun/woltka/output_taxonomy.log Input directory: /projects/cmi_proj/blood_microbiome/niaid/combined/shogun/wol_alignments. Number of alignment files to read: 20. Number of alignment files to read: 20. Demultiplexing: off. Constructing classification system... Parsing taxonomy names file: /projects/cmi_proj/blood_microbiome/three_studies/taxonomy/names.dmp... Done. Parsing taxonomy nodes file: /projects/cmi_proj/blood_microbiome/three_studies/taxonomy/nodes.dmp... Done. Parsing simple map file: /projects/cmi_proj/blood_microbiome/three_studies/taxonomy/g2tid.txt... Done. Classification system constructed. Total number of classification units: 1669744. Classification will operate on these ranks: phylum, genus, species, free, none. Read-to-feature maps will be saved to: /projects/cmi_proj/blood_microbiome/niaid/combined/shogun/woltka/mapdir. Parsing alignment file CART001_Day_14-DNA_bowtie2_wol_alignment.sam

Error log: (woltk) [adswafford@barnacle.ucsd.edu /projects/cmi_proj/blood_microbiome/niaid 08:05 AM]$ cat ~/sam_test.e1316922 Traceback (most recent call last): File "/home/adswafford/miniconda3/envs/woltk/bin/woltka", line 8, in sys.exit(cli()) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 555, in invoke return callback(args, kwargs) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/cli.py", line 181, in classify workflow(kwargs) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/workflow.py", line 109, in workflow data = classify( File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/workflow.py", line 243, in classify assignreadmap(map, data, rank, sample, **kwargs) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/workflow.py", line 688, in assign_readmap write_readmap(fh, asgmt, namedic) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/file.py", line 333, in write_readmap for taxon, count in sorted(taxa.items(), key=sortkey): TypeError: '<' not supported between instances of 'NoneType' and 'str' 13.34user 2.32system 0:28.97elapsed 54%CPU (0avgtext+0avgdata 1175712maxresident)k 188560inputs+136outputs (270major+353025minor)pagefaults 0swaps

File structure: (woltk) [adswafford@barnacle.ucsd.edu /projects/cmi_proj/blood_microbiome/niaid 08:09 AM]$ tree combined/shogun/woltka/ combined/shogun/woltka/ ├── mapdir │   ├── free │   ├── genus │   ├── none │   ├── phylum │   │   └── CART001_Day_14-DNA_bowtie2_wol_alignment.txt.gz │   └── species └── output_taxonomy.log

Upstream files (generated by bowtie2 via SHOGUN: (woltk) [adswafford@barnacle.ucsd.edu /projects/cmi_proj/blood_microbiome/niaid 08:11 AM]$ ls -halS combined/shogun/wol_alignments/ total 1.1G -rw-r--r-- 1 adswafford knightlab 778M Apr 8 10:05 Control01-DNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 705M Apr 8 10:42 Control05-RNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 703M Apr 8 11:01 Control05-DNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 554M Apr 8 10:40 Control04-DNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 455M Apr 8 10:08 CART001_Day_14-DNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 440M Apr 8 10:20 Control03-RNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 431M Apr 8 10:17 Control02-RNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 426M Apr 8 10:45 Control04-RNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 418M Apr 8 11:09 CART001_Day_60-DNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 371M Apr 8 10:26 Control02-DNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 354M Apr 8 10:15 CART001_Day_14-RNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 282M Apr 8 10:44 CART001_Day_30-RNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 241M Apr 8 09:52 CART001_Day_90-DNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 191M Apr 8 10:14 Control10-DNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 189M Apr 8 10:18 Control13-RNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 183M Apr 8 11:11 CART001_Day_7-RNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 180M Apr 8 11:13 CART001_Day_7-DNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 164M Apr 8 10:27 CART001_Day_30-DNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 132M Apr 8 10:08 CART001_Day_90-RNA_bowtie2_wol_alignment.sam -rw-r--r-- 1 adswafford knightlab 81M Apr 8 11:12 CART001_Day_60-RNA_bowtie2_wol_alignment.sam drwxr-xr-x 2 adswafford knightlab 22 Apr 8 11:12 . drwxr-xr-x 6 adswafford knightlab 7 Apr 8 11:12 ..

qiyunzhu commented 4 years ago

Hi @adswafford Thank you for reporting this bug and providing very detailed information for debugging! I did some tests but could not replicate the error. But by reading my code I can roughly guess what could be the cause -- for one query sequence, some subject genomes cannot be assigned to a given rank while others can, causing None values in the result. To solve this, I made a patch in a new branch austin, although I cannot validate that it works. You may give it a try by updating the program with pip install -U git+https://github.com/qiyunzhu/woltka.git@austin. Or you can wait a bit and let me try on the input files you provided.

adswafford commented 4 years ago

Thanks! I just tried out the patch and got a different error: (woltk) [adswafford@barnacle.ucsd.edu /projects/cmi_proj/blood_microbiome/niaid 01:26 PM]$ cat ~/sam_test.e1316955 Traceback (most recent call last): File "/home/adswafford/miniconda3/envs/woltk/bin/woltka", line 8, in sys.exit(cli()) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 555, in invoke return callback(args, kwargs) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/cli.py", line 181, in classify workflow(kwargs) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/workflow.py", line 109, in workflow data = classify( File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/workflow.py", line 243, in classify assignreadmap(map, data, rank, sample, *kwargs) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/workflow.py", line 680, in assign_readmap res = assigner(subjects, args) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/classify.py", line 108, in assign_rank return count_list(filter(taxa, None)) TypeError: 'NoneType' object is not iterable 12.86user 2.64system 0:24.09elapsed 64%CPU (0avgtext+0avgdata 1163960maxresident)k 1452376inputs+16outputs (270major+654705minor)pagefaults 0swaps

Let me know if you want me to move the alignment files to a directory where you have access?

adswafford commented 4 years ago

Progress after the second patch, but a new error:

(woltk) [adswafford@barnacle.ucsd.edu /projects/cmi_proj/blood_microbiome/niaid 01:37 PM]$ cat ~/sam_test.e1316957 Traceback (most recent call last): File "/home/adswafford/miniconda3/envs/woltk/bin/woltka", line 8, in sys.exit(cli()) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/click/core.py", line 555, in invoke return callback(args, kwargs) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/cli.py", line 181, in classify workflow(kwargs) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/workflow.py", line 115, in workflow write_profiles( File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/workflow.py", line 770, in write_profiles write_biom(profile_to_biom( File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/woltka/biom.py", line 83, in profile_to_biom return biom.Table(np.array(data), observations, samples, metadata or None, File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/biom/table.py", line 508, in init errcheck(self) File "/home/adswafford/miniconda3/envs/woltk/lib/python3.8/site-packages/biom/err.py", line 474, in errcheck raise ret biom.exception.TableException: Duplicate observation IDs 402.70user 6.80system 7:13.42elapsed 94%CPU (0avgtext+0avgdata 1240452maxresident)k 14291528inputs+324096outputs (270major+1631679minor)pagefaults 0swaps (woltk) [adswafford@barnacle.ucsd.edu /projects/cmi_proj/blood_microbiome/niaid 01:45 PM]$ tree combined/shogun/woltka/ combined/shogun/woltka/ ├── mapdir │   ├── free │   │   ├── CART001_Day_14-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_14-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_30-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_30-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_60-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_60-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_7-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_7-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_90-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_90-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control01-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control02-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control02-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control03-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control04-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control04-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control05-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control05-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control10-DNA_bowtie2_wol_alignment.txt.gz │   │   └── Control13-RNA_bowtie2_wol_alignment.txt.gz │   ├── genus │   │   ├── CART001_Day_14-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_14-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_30-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_30-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_60-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_60-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_7-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_7-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_90-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_90-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control01-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control02-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control02-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control03-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control04-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control04-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control05-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control05-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control10-DNA_bowtie2_wol_alignment.txt.gz │   │   └── Control13-RNA_bowtie2_wol_alignment.txt.gz │   ├── none │   │   ├── CART001_Day_14-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_14-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_30-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_30-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_60-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_60-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_7-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_7-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_90-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_90-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control01-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control02-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control02-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control03-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control04-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control04-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control05-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control05-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control10-DNA_bowtie2_wol_alignment.txt.gz │   │   └── Control13-RNA_bowtie2_wol_alignment.txt.gz │   ├── phylum │   │   ├── CART001_Day_14-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_14-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_30-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_30-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_60-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_60-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_7-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_7-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_90-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── CART001_Day_90-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control01-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control02-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control02-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control03-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control04-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control04-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control05-DNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control05-RNA_bowtie2_wol_alignment.txt.gz │   │   ├── Control10-DNA_bowtie2_wol_alignment.txt.gz │   │   └── Control13-RNA_bowtie2_wol_alignment.txt.gz │   └── species │   ├── CART001_Day_14-DNA_bowtie2_wol_alignment.txt.gz │   ├── CART001_Day_14-RNA_bowtie2_wol_alignment.txt.gz │   ├── CART001_Day_30-DNA_bowtie2_wol_alignment.txt.gz │   ├── CART001_Day_30-RNA_bowtie2_wol_alignment.txt.gz │   ├── CART001_Day_60-DNA_bowtie2_wol_alignment.txt.gz │   ├── CART001_Day_60-RNA_bowtie2_wol_alignment.txt.gz │   ├── CART001_Day_7-DNA_bowtie2_wol_alignment.txt.gz │   ├── CART001_Day_7-RNA_bowtie2_wol_alignment.txt.gz │   ├── CART001_Day_90-DNA_bowtie2_wol_alignment.txt.gz │   ├── CART001_Day_90-RNA_bowtie2_wol_alignment.txt.gz │   ├── Control01-DNA_bowtie2_wol_alignment.txt.gz │   ├── Control02-DNA_bowtie2_wol_alignment.txt.gz │   ├── Control02-RNA_bowtie2_wol_alignment.txt.gz │   ├── Control03-RNA_bowtie2_wol_alignment.txt.gz │   ├── Control04-DNA_bowtie2_wol_alignment.txt.gz │   ├── Control04-RNA_bowtie2_wol_alignment.txt.gz │   ├── Control05-DNA_bowtie2_wol_alignment.txt.gz │   ├── Control05-RNA_bowtie2_wol_alignment.txt.gz │   ├── Control10-DNA_bowtie2_wol_alignment.txt.gz │   └── Control13-RNA_bowtie2_wol_alignment.txt.gz ├── output_taxonomy.log └── taxonomy

7 directories, 101 files (woltk) [adswafford@barnacle.ucsd.edu /projects/cmi_proj/blood_microbiome/niaid 01:50 PM]$ cat combined/shogun/woltka/output_taxonomy.log Input directory: /projects/cmi_proj/blood_microbiome/niaid/combined/shogun/wol_alignments. Number of alignment files to read: 20. Number of alignment files to read: 20. Demultiplexing: off. Constructing classification system... Parsing taxonomy names file: /projects/cmi_proj/blood_microbiome/three_studies/taxonomy/names.dmp... Done. Parsing taxonomy nodes file: /projects/cmi_proj/blood_microbiome/three_studies/taxonomy/nodes.dmp... Done. Parsing simple map file: /projects/cmi_proj/blood_microbiome/three_studies/taxonomy/g2tid.txt... Done. Classification system constructed. Total number of classification units: 1669744. Classification will operate on these ranks: phylum, genus, species, free, none. Read-to-feature maps will be saved to: /projects/cmi_proj/blood_microbiome/niaid/combined/shogun/woltka/mapdir. Parsing alignment file CART001_Day_14-DNA_bowtie2_wol_alignment.sam .. Done. Number of query sequences: 116002. Parsing alignment file CART001_Day_14-RNA_bowtie2_wol_alignment.sam . Done. Number of query sequences: 89445. Parsing alignment file CART001_Day_30-DNA_bowtie2_wol_alignment.sam . Done. Number of query sequences: 48669. Parsing alignment file CART001_Day_30-RNA_bowtie2_wol_alignment.sam . Done. Number of query sequences: 102153. Parsing alignment file CART001_Day_60-DNA_bowtie2_wol_alignment.sam .. Done. Number of query sequences: 131112. Parsing alignment file CART001_Day_60-RNA_bowtie2_wol_alignment.sam . Done. Number of query sequences: 25457. Parsing alignment file CART001_Day_7-DNA_bowtie2_wol_alignment.sam . Done. Number of query sequences: 56918. Parsing alignment file CART001_Day_7-RNA_bowtie2_wol_alignment.sam . Done. Number of query sequences: 73395. Parsing alignment file CART001_Day_90-DNA_bowtie2_wol_alignment.sam . Done. Number of query sequences: 53435. Parsing alignment file CART001_Day_90-RNA_bowtie2_wol_alignment.sam . Done. Number of query sequences: 31346. Parsing alignment file Control01-DNA_bowtie2_wol_alignment.sam .. Done. Number of query sequences: 279521. Parsing alignment file Control02-DNA_bowtie2_wol_alignment.sam . Done. Number of query sequences: 116450. Parsing alignment file Control02-RNA_bowtie2_wol_alignment.sam .. Done. Number of query sequences: 269027. Parsing alignment file Control03-RNA_bowtie2_wol_alignment.sam .. Done. Number of query sequences: 302092. Parsing alignment file Control04-DNA_bowtie2_wol_alignment.sam .. Done. Number of query sequences: 231287. Parsing alignment file Control04-RNA_bowtie2_wol_alignment.sam .. Done. Number of query sequences: 252036. Parsing alignment file Control05-DNA_bowtie2_wol_alignment.sam .. Done. Number of query sequences: 306100. Parsing alignment file Control05-RNA_bowtie2_wol_alignment.sam .. Done. Number of query sequences: 439871. Parsing alignment file Control10-DNA_bowtie2_wol_alignment.sam . Done. Number of query sequences: 85534. Parsing alignment file Control13-RNA_bowtie2_wol_alignment.sam . Done. Number of query sequences: 142018. Task completed. Format of output feature table(s): BIOM.

qiyunzhu commented 4 years ago

Hi @adswafford Sorry for having you waiting. I am getting back to the program. Looks like the program managed to progress till the 2nd last step! The error message "Duplicate observation IDs" suggests that there are some duplicate taxon names. I guess this is because some taxon names in NCBI taxdump are duplicate.

To resolve this, you may remove --name-as-id from the command line, so that the observation IDs in the BIOM table will still be taxon IDs, which are unique. meanwhile an extra metadata column will be appended to the table, listing corresponding names, which can have duplicates.

Alternatively, you can add --to-tsv to the woltka command, and the output files will be in plain tab-delimited format, in which duplicate row headers are tolerated. This is not recommended but just for a quick check-up.

In my impression, the only case in NCBI taxonomy where two names are identical are phylum Actinobacteria (201174) and class Actinobacteria (1760). This is quite unfortunate. The instance Woltka could run into error is the free rank classification, where a sequence can be assigned to any rank. If you remove that free I guess Woltka will work as well.

I will work on the code to fix this issue as well as the other issue you reported later today.

adswafford commented 4 years ago

Got it, thanks for the suggestions, explanations, and investigations. I dropped free and it seems to be running now, and I'll let you know if it hits another snag. Thanks!

On Fri, Apr 10, 2020 at 9:42 AM Qiyun Zhu notifications@github.com wrote:

Hi @adswafford https://github.com/adswafford Sorry for having you waiting. I am getting back to the program. Looks like the program managed to progress till the 2nd last step! The error message "Duplicate observation IDs" suggests that there are some duplicate taxon names. I guess this is because some taxon names in NCBI taxdump are duplicate. To resolve this, you may remove --name-as-id from the command line, so that the observation IDs in the BIOM table will still be taxon IDs, which are unique. meanwhile an extra metadata column will be appended to the table, listing corresponding names, which can have duplicates.

Alternatively, you can add --to-tsv to the woltka command, and the output files will be in plain tab-delimited format, in which duplicate row headers are tolerated. This is not recommended but just for a quick check-up.

In my impression, the only case in NCBI taxonomy where two names are identical are phylum Actinobacteria (201174 https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=201174&lvl=3&lin=f&keep=1&srchmode=1&unlock) and class Actinobacteria ([1760( https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=1760&lvl=3&lin=f&keep=1&srchmode=1&unlock)). This is quite unfortunate. The instance Woltka could run into error is the free rank classification, where a sequence can be assigned to any rank. If you remove that free I guess Woltka will work as well.

I will work on the code to fix this issue as well as the other issue you reported later today.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qiyunzhu/woltka/issues/34#issuecomment-612113719, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGOEDBW37KKD5ZFYVWNXKGLRL5D55ANCNFSM4MEZ2C6Q .