poeli / GOTTCHA

More details and updates can be found in our homepage and LANL-Bioinformatics Github site (https://github.com/LANL-Bioinformatics/GOTTCHA). Please visit our homepage at
http://lanl-bioinformatics.github.io/GOTTCHA
GNU General Public License v3.0
8 stars 10 forks source link

Empty output with some files #1

Closed donutbrew closed 9 years ago

donutbrew commented 9 years ago

I'm getting empty output with some files, that is, I get the gottcha output files only list the headers, but no real info. Looks like the script dies when there is an undefined gi? Haven't had time to look into it very deeply yet, but perhaps you guys have encountered this already.

Here is the log file:

----> ENTRY HEADER:@HWI-ST906:483:C5A6FACXX:8:1101:1226:2130#TGGGAGT/1
Threads: 1 (effective)  1 (requested)
IDX = 0: reading from 0 to 170333051204
Staggering at 125000 reads; processed 125000 reads
IDX(0) counted 660331874 reads
GLOBAL READ COUNT = 660331874
Trim Time: 19258233 ms

PROGRAM ELAPSED TIME: 19977773 ms

Parsing HSTEST_temp/splitrim/HSTEST_splitrim.stats.txt... found 1045087807 reads and 36480442110 bases.
->Retrieving parsed DB from disk [/db/GOTTCHA/database/GOTTCHA_VIRUSES_c3498_k85_u24_xHUMAN3x.species.parsedGOTTCHA.dmp]...done.  0 wallclock secs ( 0.03 usr +  0.00 sys =  0.03 CPU)
->Parsing SAM file [-] [main] Version: 0.7.9a-r786
[main] CMD: bwa mem -k 30 -T 0 -B 100 -O 100 -E 100 -t 31 /db/GOTTCHA/database/GOTTCHA_VIRUSES_c3498_k85_u24_xHUMAN3x.species HSTEST_temp/splitrim/HSTEST_splitrim.fastq
[main] Real time: 2.480 sec; CPU: 0.192 sec
done.  1 wallclock secs ( 0.04 usr +  0.00 sys =  0.04 CPU)
->Consolidating hits...done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Retrieving Genome Vitals from disk [/db/GOTTCHA/database/genomeVitals.dmp]...done.  0 wallclock secs ( 0.17 usr +  0.03 sys =  0.20 CPU)
->Storing coordinates to disk [./gottcha//HSTEST_temp/HSTEST.replicon.contig.coords.csv]...done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Storing datastructure GI coordinates to disk in BINARY format as "./gottcha//HSTEST_temp/HSTEST.giCoords.dmp"...done. 0 wallsecs
->Storing datastructure contig length histogram (by entry) to disk in BINARY format as "./gottcha//HSTEST_temp/HSTEST.replicon.contig.HistoByEntry.dmp"...done. 0 wallsecs
->Storing datastructure contig length histogram (by GI) to disk in BINARY format as "./gottcha//HSTEST_temp/HSTEST.replicon.contig.HistoByGI.dmp"...done. 0 wallsecs
->Storing parseable contig length histogram(s) to disk...done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Calculating non-overlapping coverage from mapping results...done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
===== REPLICON-level Analysis =====
->Writing REPLICON-level results to disk [./gottcha//HSTEST_temp/HSTEST.replicon.tsv]...Use of uninitialized value $gis[0] in hash element at /home/x1user/git/gottcha/bin/profileGottcha.pl line 1280.
Use of uninitialized value $gis[0] in hash element at /home/x1user/git/gottcha/bin/profileGottcha.pl line 1281.
done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
===== Extended Taxonomic Rank-level Analysis =====
->Pulling replicon GIs from DB entries...done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Mapping replicon GIs to source organisms...Use of uninitialized value $org in hash element at /home/x1user/git/gottcha/bin/profileGottcha.pl line 1465.
done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Retrieving Tax Tree from disk [/db/GOTTCHA/database/speciesTreeGI.dmp]...done.  6 wallclock secs ( 3.55 usr +  0.58 sys =  4.13 CPU)
->Mapping source organism to its tax tree...done.  1 wallclock secs ( 1.72 usr +  0.04 sys =  1.76 CPU)
->At least one organism name in "/db/GOTTCHA/database/genomeVitals.dmp" is unrecognized in "/db/GOTTCHA/database/speciesTreeGI.dmp":
      []
Continuing on...
->Rolling up results for rank STRAIN [./gottcha//HSTEST_temp/HSTEST.strain.tsv]...TAXTREE does not exist for GI ""!
done.  0 wallclock secs ( 0.21 usr +  0.01 sys =  0.22 CPU)
->Rolling up results for rank SPECIES [./gottcha//HSTEST_temp/HSTEST.species.tsv]...TAXTREE does not exist for GI ""!
done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Rolling up results for rank GENUS [./gottcha//HSTEST_temp/HSTEST.genus.tsv]...TAXTREE does not exist for GI ""!
done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Rolling up results for rank FAMILY [./gottcha//HSTEST_temp/HSTEST.family.tsv]...TAXTREE does not exist for GI ""!
done.  0 wallclock secs ( 0.00 usr +  0.01 sys =  0.01 CPU)
->Rolling up results for rank ORDER [./gottcha//HSTEST_temp/HSTEST.order.tsv]...TAXTREE does not exist for GI ""!
done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Rolling up results for rank CLASS [./gottcha//HSTEST_temp/HSTEST.class.tsv]...TAXTREE does not exist for GI ""!
done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
->Rolling up results for rank PHYLUM [./gottcha//HSTEST_temp/HSTEST.phylum.tsv]...TAXTREE does not exist for GI ""!
done.  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
Use of uninitialized value in addition (+) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 516.
Use of uninitialized value in numeric lt (<) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 538.
Saving updated table to "./gottcha//HSTEST_temp/HSTEST.strain.tsv.ABU"...done!
Use of uninitialized value in addition (+) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 516.
Use of uninitialized value in numeric lt (<) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 538.
Saving updated table to "./gottcha//HSTEST_temp/HSTEST.species.tsv.ABU"...done!
Use of uninitialized value in addition (+) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 516.
Use of uninitialized value in numeric lt (<) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 538.
Saving updated table to "./gottcha//HSTEST_temp/HSTEST.genus.tsv.ABU"...done!
Use of uninitialized value in addition (+) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 516.
Use of uninitialized value in numeric lt (<) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 538.
Saving updated table to "./gottcha//HSTEST_temp/HSTEST.family.tsv.ABU"...done!
Use of uninitialized value in addition (+) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 516.
Use of uninitialized value in numeric lt (<) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 538.
Saving updated table to "./gottcha//HSTEST_temp/HSTEST.order.tsv.ABU"...done!
Use of uninitialized value in addition (+) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 516.
Use of uninitialized value in numeric lt (<) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 538.
Saving updated table to "./gottcha//HSTEST_temp/HSTEST.class.tsv.ABU"...done!
Use of uninitialized value in addition (+) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 516.
Use of uninitialized value in numeric lt (<) at /home/x1user/git/gottcha/bin/profileGottcha.pl line 538.
Saving updated table to "./gottcha//HSTEST_temp/HSTEST.phylum.tsv.ABU"...done!
------------------
TOTAL SCRIPT TIME:  -   10 wallclock secs ( 6.98 usr +  0.67 sys =  7.65 CPU)
------------------
Done converting 1 files to list.
Done converting 7 files to list.
poeli commented 9 years ago

Thanks for trying GOTTCHA and reporting issues!

I didn't see fatal errors in the log. The reason you saw the warning messages below was the script couldn't find any mapped read from BWA results.

TAXTREE does not exist for GI ""!

Seems you were using species-level signature database, you might want to try genus-level database. If it doesn't work, could you provide the messages from standard output? It should show how reads mapping step goes.

Thanks!

poeli commented 9 years ago

I attached an example of the standard output. If you didn't see something like that, please pull the latest codes from github and try again. Thanks! image

donutbrew commented 9 years ago

I reran the sample using the genus database. Here is the STDOUT:

$ ~/git/gottcha/bin/gottcha.pl --threads 31 --outdir gottcha2 --input ./HSTEST.fastq --database /db/GOTTCHA/database/GOTTCHA_VIRUSES_c3498_k85_u24_xHUMAN3x.genus --mode all
[00:00:00] Starting GOTTCHA v0.9a
[00:00:00] Auto set database level to GENUS.
[00:00:00] Checking running environment...
[00:00:00] All required scripts and tools found.
[00:00:00] Split-trimming input reads (fixL=30, minQ=20, ascii=33)
[05:31:33] Done splitrim.
[05:31:33] Mapping split-trimmed reads to GOTTCHA database and profiling...
[05:31:43] Done result profiling.
[05:31:43] Filtering profiling results...
[05:31:47] Done filtering.
[05:31:47] Preparing result in all mode...
[05:31:47] Done genereting a summary report (gottcha2/HSTEST.gottcha.tsv).
[05:31:47] Done generating the report in full mode (gottcha2/HSTEST.gottcha_full.tsv).
[05:31:47] All outputs stored in gottcha2/HSTEST_temp directory.
[05:31:47] Finished.

Logfile is much the same. I can post it if you'd like. The thing is, GOTTCHA works fine for the sample dataset and for some datasets. I'm just not sure what the difference is here. How can I help?

poeli commented 9 years ago

It could be two possible reasons: the dataset couldn't be mapped to any signatures OR the preliminary profiling results couldn't pass the filter. Please pull the new codes from the github and run it again. It will provide more information at the stdout like the screenshot I posted. Thanks!

donutbrew commented 9 years ago

That was from the latest github release (pull reports "Already up-to-date"). gottcha.pl reports it is v0.9a. Here is the output for the test dataset:

$ ~/git/gottcha/bin/gottcha.pl --threads 31 --outdir . --input ./test.fastq --database /db/GOTTCHA/database/GOTTCHA_VIRUSES_c3498_k85_u24_xHUMAN3x.genus --mode all
[00:00:00] Starting GOTTCHA v0.9a
[00:00:00] Auto set database level to GENUS.
[00:00:00] Checking running environment...
[00:00:00] All required scripts and tools found.
[00:00:00] Split-trimming input reads (fixL=30, minQ=20, ascii=33)
[00:00:06] Done splitrim.
[00:00:06] Mapping split-trimmed reads to GOTTCHA database and profiling...
[00:00:17] Done result profiling.
[00:00:17] Filtering profiling results...
[00:00:21] Done filtering.
[00:00:21] Preparing result in all mode...
[00:00:21] Done genereting a summary report (./test.gottcha.tsv).
[00:00:21] Done generating the report in full mode (./test.gottcha_full.tsv).
[00:00:21] All outputs stored in ./test_temp directory.
[00:00:21] Finished.

The output files look fine.

For the other file, I know that there are mapable reads, but for viruses, it may be at the 0.1% range. This is a metagenomic run. Is that a problem?

poeli commented 9 years ago

You need to run INSTALL.sh again. :)

Let's see if the new stdout answers your question.

donutbrew commented 9 years ago

Ha, I actually did run INSTALL.sh again to make sure I had all the right versions of everything. I haven't run gottcha on my larger dataset yet, I'll start it soon. For now, here is the output of the test data

$ ~/git/gottcha/bin/gottcha.pl --threads 31 --outdir . --input ./test.fastq --database /db/GOTTCHA/database/GOTTCHA_VIRUSES_c3498_k85_u24_xHUMAN3x.genus --mode all
[00:00:00] Starting GOTTCHA v0.9a
[00:00:00] Auto set database level to GENUS.
[00:00:00] Checking running environment...
[00:00:00] All required scripts and tools found.
[00:00:00] Split-trimming input reads (fixL=30, minQ=20, ascii=33)
[00:00:02] Done splitrim.
[00:00:02] Mapping split-trimmed reads to GOTTCHA database and profiling...
[00:00:14] Done result profiling.
[00:00:14] Filtering profiling results...
[00:00:18] Done filtering.
[00:00:18] Preparing result in all mode...
[00:00:18] Done genereting a summary report (./test.gottcha.tsv).
[00:00:18] Done generating the report in full mode (./test.gottcha_full.tsv).
[00:00:18] All outputs stored in ./test_temp directory.
[00:00:18] Finished.

Should there be more?

poeli commented 9 years ago

I pushed some updates to another fork (LANL-Bioinformatics/GOTTCHA), but didn't sync this one. I have synced the forks. Please pull & install again! Thanks!

donutbrew commented 9 years ago

Thanks for the help. The fresh pull/install fixed the issue!