labgem / PlaScope

:microscope: :o: Plasmid exploration of bacterial genomes
GNU General Public License v3.0
17 stars 4 forks source link

No output in PlaScope_predictions #1

Closed nickp60 closed 6 years ago

nickp60 commented 6 years ago

Hi, I am trying to test PlaScope (verison 1.3) using this O104:H4 assembly (stored as O104H4.fasta) and the provided database (stored in ~/dbs/plascope/) , and am getting the following warnings:


$ plaScope.sh --fasta ~/2018-08-selecting_plasmid_finder/O104H4.fasta -o test --db_dir ~/dbs/plascope --db_name chromosome_plasmid_db --sample name_of_my_sample -n

Mode 2
Step 1: Contigs classification with Centrifuge and custom database
Centrifuge log can be found here: test/name_of_my_sample_PlaScope/Centrifuge_results/centrifuge.log
Step 2: Extraction of plasmid, chromosome and unclassified predictions
Warning: >CP003289.1 Escherichia coli O104:H4 str. 2011C-3493, complete genome not classified.
Warning: >CP003291.1 Escherichia coli O104:H4 str. 2011C-3493 plasmid pAA-EA11, complete sequence not classified.
Warning: >CP003290.1 Escherichia coli O104:H4 str. 2011C-3493 plasmid pESBL-EA11, complete sequence not classified.
Warning: >CP003292.1 Escherichia coli O104:H4 str. 2011C-3493 plasmid pG-EA11, complete sequence not classified.
If you use PlaScope please cite: ...

Here are the contents of test/name_of_my_sample_PlaScope/Centrifuge_results/name_of_my_sample_extendedresult:

readID  seqID   taxID   score   2ndBestScore    hitLength       queryLength     numMatches
CP003292.1      NC_022740.1     3       1290496 0       1151    1549    1
CP003290.1      species 3       1761963080      0       54346   88544   1
CP003291.1      species 3       376760648       0       42917   74217   1
CP003289.1      NZ_CP025401.1   2       3187944114      0       224663  5273097 1

And the contents of test/name_of_my_sample_PlaScope/Centrifuge_results/name_of_my_sample_summary:

name    taxID   taxRank genomeSize      numReads        numUniqueReads  abundance
2       2       species 1722383768      1       1       0.0
3       3       species 241331578       3       3       0.0

Once it finishes, the PlaScope_predictions subdirectory is empty. How can I check the predictions for each of the contigs? Why are these warnings being thrown?

Thanks in advance!

GuilhemRoyer commented 6 years ago

Hi nickp60!

The current version of PlaScope required SPAdes-formated header (e.g. ">NODE_36_length_43824_cov_77.8425"). This allows us to sort contigs according to their coverage (SPAdes coverage > 2), which is generally more relevant as low-coverage contigs are frequently low quality or contaminated contigs.

However as shown in the "name_of_my_sample_extendedresult" file the three plasmids (CP003292.1, CP003290.1, CP003291.1) are correctly classified (value in third column = 3) as well as the chromosome (CP003289.1, value in third column = 2).

I plan to propose a version independent of the contigs format as soon. But for now if you use contigs with a not suitable format you will get these warnings and contigs will not be extracted.

nickp60 commented 6 years ago

Hi @GuilhemRoyer, thanks for the update! I will be SPAdes input for the actually analysis, so that actually works out well; though it might be nice to have the option to turn of this behaviour for use with non-SPAdes input. Thanks for clearing this up!