Closed donutbrew closed 9 years ago
Thanks for trying GOTTCHA and reporting issues!
I didn't see fatal errors in the log. The reason you saw the warning messages below was the script couldn't find any mapped read from BWA results.
TAXTREE does not exist for GI ""!
Seems you were using species-level signature database, you might want to try genus-level database. If it doesn't work, could you provide the messages from standard output? It should show how reads mapping step goes.
Thanks!
I attached an example of the standard output. If you didn't see something like that, please pull the latest codes from github and try again. Thanks!
I reran the sample using the genus database. Here is the STDOUT:
$ ~/git/gottcha/bin/gottcha.pl --threads 31 --outdir gottcha2 --input ./HSTEST.fastq --database /db/GOTTCHA/database/GOTTCHA_VIRUSES_c3498_k85_u24_xHUMAN3x.genus --mode all
[00:00:00] Starting GOTTCHA v0.9a
[00:00:00] Auto set database level to GENUS.
[00:00:00] Checking running environment...
[00:00:00] All required scripts and tools found.
[00:00:00] Split-trimming input reads (fixL=30, minQ=20, ascii=33)
[05:31:33] Done splitrim.
[05:31:33] Mapping split-trimmed reads to GOTTCHA database and profiling...
[05:31:43] Done result profiling.
[05:31:43] Filtering profiling results...
[05:31:47] Done filtering.
[05:31:47] Preparing result in all mode...
[05:31:47] Done genereting a summary report (gottcha2/HSTEST.gottcha.tsv).
[05:31:47] Done generating the report in full mode (gottcha2/HSTEST.gottcha_full.tsv).
[05:31:47] All outputs stored in gottcha2/HSTEST_temp directory.
[05:31:47] Finished.
Logfile is much the same. I can post it if you'd like. The thing is, GOTTCHA works fine for the sample dataset and for some datasets. I'm just not sure what the difference is here. How can I help?
It could be two possible reasons: the dataset couldn't be mapped to any signatures OR the preliminary profiling results couldn't pass the filter. Please pull the new codes from the github and run it again. It will provide more information at the stdout like the screenshot I posted. Thanks!
That was from the latest github release (pull reports "Already up-to-date"). gottcha.pl reports it is v0.9a. Here is the output for the test dataset:
$ ~/git/gottcha/bin/gottcha.pl --threads 31 --outdir . --input ./test.fastq --database /db/GOTTCHA/database/GOTTCHA_VIRUSES_c3498_k85_u24_xHUMAN3x.genus --mode all
[00:00:00] Starting GOTTCHA v0.9a
[00:00:00] Auto set database level to GENUS.
[00:00:00] Checking running environment...
[00:00:00] All required scripts and tools found.
[00:00:00] Split-trimming input reads (fixL=30, minQ=20, ascii=33)
[00:00:06] Done splitrim.
[00:00:06] Mapping split-trimmed reads to GOTTCHA database and profiling...
[00:00:17] Done result profiling.
[00:00:17] Filtering profiling results...
[00:00:21] Done filtering.
[00:00:21] Preparing result in all mode...
[00:00:21] Done genereting a summary report (./test.gottcha.tsv).
[00:00:21] Done generating the report in full mode (./test.gottcha_full.tsv).
[00:00:21] All outputs stored in ./test_temp directory.
[00:00:21] Finished.
The output files look fine.
For the other file, I know that there are mapable reads, but for viruses, it may be at the 0.1% range. This is a metagenomic run. Is that a problem?
You need to run INSTALL.sh again. :)
Let's see if the new stdout answers your question.
Ha, I actually did run INSTALL.sh again to make sure I had all the right versions of everything. I haven't run gottcha on my larger dataset yet, I'll start it soon. For now, here is the output of the test data
$ ~/git/gottcha/bin/gottcha.pl --threads 31 --outdir . --input ./test.fastq --database /db/GOTTCHA/database/GOTTCHA_VIRUSES_c3498_k85_u24_xHUMAN3x.genus --mode all
[00:00:00] Starting GOTTCHA v0.9a
[00:00:00] Auto set database level to GENUS.
[00:00:00] Checking running environment...
[00:00:00] All required scripts and tools found.
[00:00:00] Split-trimming input reads (fixL=30, minQ=20, ascii=33)
[00:00:02] Done splitrim.
[00:00:02] Mapping split-trimmed reads to GOTTCHA database and profiling...
[00:00:14] Done result profiling.
[00:00:14] Filtering profiling results...
[00:00:18] Done filtering.
[00:00:18] Preparing result in all mode...
[00:00:18] Done genereting a summary report (./test.gottcha.tsv).
[00:00:18] Done generating the report in full mode (./test.gottcha_full.tsv).
[00:00:18] All outputs stored in ./test_temp directory.
[00:00:18] Finished.
Should there be more?
I pushed some updates to another fork (LANL-Bioinformatics/GOTTCHA), but didn't sync this one. I have synced the forks. Please pull & install again! Thanks!
Thanks for the help. The fresh pull/install fixed the issue!
I'm getting empty output with some files, that is, I get the gottcha output files only list the headers, but no real info. Looks like the script dies when there is an undefined gi? Haven't had time to look into it very deeply yet, but perhaps you guys have encountered this already.
Here is the log file: