poeli / GOTTCHA

More details and updates can be found in our homepage and LANL-Bioinformatics Github site (https://github.com/LANL-Bioinformatics/GOTTCHA). Please visit our homepage at
http://lanl-bioinformatics.github.io/GOTTCHA
GNU General Public License v3.0
8 stars 10 forks source link

temp folders persist #8

Open jonathanjacobs opened 8 years ago

jonathanjacobs commented 8 years ago

It seems if GOTTCHA fails to find anything, then the temp folder remains after the run is complete. Everything seems to of fine, but this makes me wonder if the tool carried out the analysis correctly... (it usually finds... something)

Here's an example:

EXAMPLE 1, Temp folder remains / no bugs detected

----> ENTRY HEADER:@M70116:4:000000000-AJEH6:1:1101:14371:1905 1:N:0:7
Threads: 1 (effective)  1 (requested)
IDX = 0: reading from 0 to 1920571466
Staggering at 125000 reads; processed 125000 reads
IDX(0) counted 5534518 reads
GLOBAL READ COUNT = 5534518
Trim Time: 281203 ms

PROGRAM ELAPSED TIME: 298459 ms
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 4173880 sequences (140000054 bp)...
[M::process] read 4161872 sequences (140000050 bp)...
[M::mem_process_seqs] Processed 4173880 reads in 179.171 CPU sec, 17.148 real sec
[M::mem_process_seqs] Processed 4161872 reads in 170.680 CPU sec, 14.698 real sec
[M::process] read 4108872 sequences (140000067 bp)...
[M::mem_process_seqs] Processed 4108872 reads in 162.569 CPU sec, 12.050 real sec
[M::process] read 4110420 sequences (140000064 bp)...
[M::mem_process_seqs] Processed 4110420 reads in 162.380 CPU sec, 11.671 real sec
[M::process] read 683426 sequences (23120420 bp)...
[M::mem_process_seqs] Processed 683426 reads in 26.500 CPU sec, 2.061 real sec
[main] Version: 0.7.12-r1044
[main] CMD: bwa mem -k 30 -T 0 -B 100 -O 100 -E 100 -t 14 /home/src/gottcha/database/GOTTCHA_BACTERIA_c4937_k24_u30_xHUMAN3x.species -
[main] Real time: 154.603 sec; CPU: 751.777 sec

Parsing ./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA_splitrim.stats.txt... found 5534518 reads (split-trimmed: 17238470) and 579914609 bases.
->Retrieving parsed DB from disk [/home/src/gottcha/database/GOTTCHA_BACTERIA_c4937_k24_u30_xHUMAN3x.species.parsedGOTTCHA.dmp]...done. 16 wallclock secs (14.02 usr +  2.05 sys = 16.07 CPU)
->Parsing SAM file [-] done. Mapped split-trimmed reads: 170; Mapped split-trimmed reads to plasmids: 14; Unmapped split-trimmed reads: 17238259; Mapped raw reads: 59; Mapped raw reads to plasmids: 6. 139 wallclock secs (100.16 usr +  1.82 sys = 101.98 CPU)
->Consolidating hits...done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Retrieving Genome Vitals from disk [/home/src/gottcha/database/genomeVitals.dmp]...done.  0 wallclock secs ( 0.19 usr +  0.03 sys =  0.22 CPU)
->Storing coordinates to disk [./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.replicon.contig.coords.csv]...done.  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU)
->Storing datastructure GI coordinates to disk in BINARY format as "./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.giCoords.dmp"...done. 0 wallsecs
->Storing datastructure contig length histogram (by entry) to disk in BINARY format as "./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.replicon.contig.HistoByEntry.dmp"...done. 0 wallsecs
->Storing datastructure contig length histogram (by GI) to disk in BINARY format as "./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.replicon.contig.HistoByGI.dmp"...done. 0 wallsecs
->Storing parseable contig length histogram(s) to disk...done.  1 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Calculating non-overlapping coverage from mapping results...done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
===== REPLICON-level Analysis =====
->Writing REPLICON-level results to disk [./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.replicon.tsv]...done.  1 wallclock secs ( 0.08 usr +  0.00 sys =  0.08 CPU)
===== Extended Taxonomic Rank-level Analysis =====
->Pulling replicon GIs from DB entries...done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Mapping replicon GIs to source organisms...done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Retrieving Tax Tree from disk [/home/src/gottcha/database/speciesTreeGI.dmp]...done.  5 wallclock secs ( 5.56 usr +  0.14 sys =  5.70 CPU)
->Mapping source organism to its tax tree...done.  3 wallclock secs ( 2.96 usr +  0.01 sys =  2.97 CPU)
  Congratulations! All organisms have been identified!
->Rolling up results for rank STRAIN [./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.strain.tsv]...done.  1 wallclock secs ( 0.70 usr +  0.00 sys =  0.70 CPU)
->Rolling up results for rank SPECIES [./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.species.tsv]...done.  1 wallclock secs ( 0.24 usr +  0.00 sys =  0.24 CPU)
->Rolling up results for rank GENUS [./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.genus.tsv]...done.  0 wallclock secs ( 0.24 usr +  0.00 sys =  0.24 CPU)
->Rolling up results for rank FAMILY [./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.family.tsv]...done.  1 wallclock secs ( 0.23 usr +  0.00 sys =  0.23 CPU)
->Rolling up results for rank ORDER [./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.order.tsv]...done.  0 wallclock secs ( 0.24 usr +  0.01 sys =  0.25 CPU)
->Rolling up results for rank CLASS [./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.class.tsv]...done.  1 wallclock secs ( 0.23 usr +  0.00 sys =  0.23 CPU)
->Rolling up results for rank PHYLUM [./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.phylum.tsv]...done.  0 wallclock secs ( 0.24 usr +  0.00 sys =  0.24 CPU)
Saving updated table to "./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.strain.tsv.ABU"...done. (20 taxonomie(s), 20 filtered)
Saving updated table to "./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.species.tsv.ABU"...done. (18 taxonomie(s), 18 filtered)
Saving updated table to "./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.genus.tsv.ABU"...done. (13 taxonomie(s), 13 filtered)
Saving updated table to "./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.family.tsv.ABU"...done. (10 taxonomie(s), 10 filtered)
Saving updated table to "./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.order.tsv.ABU"...done. (7 taxonomie(s), 7 filtered)
Saving updated table to "./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.class.tsv.ABU"...done. (5 taxonomie(s), 5 filtered)
Saving updated table to "./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.phylum.tsv.ABU"...done. (3 taxonomie(s), 3 filtered)
------------------
TOTAL SCRIPT TIME:  -   177 wallclock secs (132.39 usr +  4.08 sys = 136.47 CPU)
------------------
Loading STRAIN Lookup file...done!
Loading SPECIES Lookup file...done!
Loading GENUS Lookup file...done!
Loading FAMILY Lookup file...done!
Loading ORDER Lookup file...done!
Loading CLASS Lookup file...done!
Loading PHYLUM Lookup file...done!
Loading TAX Lookup file...done!
Parsing table "./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.strain.tsv"...Done!

STRAIN  LINEAR_LENGTH   TOTAL_BP_MAPPED HIT_COUNT   HIT_COUNT_PLASMID   READ_COUNT  LINEAR_DOC  NORM_COV
=================================================================

SPECIES LINEAR_LENGTH   TOTAL_BP_MAPPED HIT_COUNT   HIT_COUNT_PLASMID   READ_COUNT  LINEAR_DOC  NORM_COV
=================================================================

GENUS   LINEAR_LENGTH   TOTAL_BP_MAPPED HIT_COUNT   HIT_COUNT_PLASMID   READ_COUNT  LINEAR_DOC  NORM_COV
=================================================================

FAMILY  LINEAR_LENGTH   TOTAL_BP_MAPPED HIT_COUNT   HIT_COUNT_PLASMID   READ_COUNT  LINEAR_DOC  NORM_COV
=================================================================

ORDER   LINEAR_LENGTH   TOTAL_BP_MAPPED HIT_COUNT   HIT_COUNT_PLASMID   READ_COUNT  LINEAR_DOC  NORM_COV
=================================================================

CLASS   LINEAR_LENGTH   TOTAL_BP_MAPPED HIT_COUNT   HIT_COUNT_PLASMID   READ_COUNT  LINEAR_DOC  NORM_COV
=================================================================

PHYLUM  LINEAR_LENGTH   TOTAL_BP_MAPPED HIT_COUNT   HIT_COUNT_PLASMID   READ_COUNT  LINEAR_DOC  NORM_COV
=================================================================
Exporting results to disk [./PAN-0060-QCB_S7.BACTERIA_temp/PAN-0060-QCB_S7.BACTERIA.species.tsv.ABUX]...Done!
poeli commented 8 years ago

There're two situations that GOTTCHA can't find anything: 1) all input reads are unmapped to any signature, 2) insufficient coverage/hits to call any organism present. Most of the cases are the second one. Therefore, the unfiltered/raw output could be very useful if users want to figure out what organisms your input reads hit to and why they're filtered out. That's why all temp files/directory are being kept by default under this circumstance.

jonathanjacobs commented 8 years ago

Gottcha! Thanks. šŸ˜†

Sent from my iPhone Jonathan Jacobs 240 447 4039

On Oct 30, 2015, at 5:30 PM, poeli notifications@github.com wrote:

There're two situations that GOTTCHA can't find anything: 1) all input reads are unmapped to any signature, 2) insufficient coverage/hits to call any organism present. Most of the cases are the second one. Therefore, the unfiltered/raw output could be very useful if users want to figure out what organisms your input reads hit to and why they're filtered out. That's why all temp files/directory are being kept by default under this circumstance.

ā€” Reply to this email directly or view it on GitHub.