merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
437 stars 144 forks source link

Error in documentation for anvi-gen-contigs-database #939

Closed btemperton closed 5 years ago

btemperton commented 6 years ago

When trying to create an anvio v5 (installed through conda on linux ubuntu 16) for viral contigs, I came across an error in the documentation of the above. I don't want to split my contigs because they are typically 20-30 kbp. The documentation states:

-L INT, --split-length INT Splitting very large contigs into multiple pieces improves the efficacy of the visualization step. The default value is (20000). If you are not sure, we advise you to not go below 10,000. The lower you go, the more complicated the tree will be, and will take more time and computational resources to finish the analysis. Also this is not a case of 'the smaller the split size the more sensitive the results'. If you do not want your contigs to be split, you can either simply enter '0' or ANY OTHER negative integer (lots of unnecessary freedom here, enjoy!).

However, setting --split-length 0 using the following command:

` anvi-gen-contigs-database -f anvio.fa \ -o anvio.db \ -n 'HTVC010P-like contigs database' \ --split-length 0 \ --external-gene-calls mga.gene.calls.txt \ --description description.md

`

threw the following error:

Config Error: Creating a new contigs database requires split length information to be provided. But the ContigsDatabase class was called to create one without this bit of information. Not cool. Setting --split-length -1 worked and did not try to split the contigs:

` Split Length .................................: 9,223,372,036,854,775,807 K-mer size ...................................: 4 Skip gene calling? ...........................: False External gene calls provided? ................: mga.gene.calls.txt Ignoring internal stop codons? ...............: False Splitting pays attention to gene calls? ......: True External gene calls ..........................: 17657 gene calls recovered and will be processed.

WARNING

693 of your 17657 gene calls were impartial, hence the translated amino acid sequences for those were not stored in the database.

Contigs with at least one gene call ..........: 488 of 488 (100.0%) Contigs database .............................: A new database, anvio.db, has been created. Number of contigs ............................: 488 Number of splits .............................: 488 Total number of nucleotides ..................: 11,066,397 Gene calling step skipped ....................: False Splits broke genes (non-mindful mode) ........: False Desired split length (what the user wanted) ..: 9,223,372,036,854,775,807 Average split length (wnat anvi'o gave back) .: (Anvi'o did not create any splits)

` (there's also an interesting spelling of 'what' in the last output line :). )

meren commented 5 years ago

I think we fixed this :) Thank you!