knights-lab / SHOGUN

SHallow shOtGUN profiler
GNU Affero General Public License v3.0
54 stars 19 forks source link

SHOGUN creates empty files #23

Open padbc opened 5 years ago

padbc commented 5 years ago

Hi all,

I've successfully installed SHOGUN (tests run with no apparent errors), but each step of the pipeline results in empty files, whether they're run individually or in pipeline mode.

miguensblanco commented 3 years ago

Kind of the same error, I run the filter command on my server, files are generated, no apparent error but the file is empty.

Thanks!

JMB

GabeAl commented 3 years ago

Hi all, did you see any log files being written in the working or output directories?

@bhillmann Hey Ben -- pinging for visibility.

Justice-Lu commented 3 years ago

I'm facing the same error as well..

To provide some context, I've included the script and output below:

I've ran the debug pipeline as well as the shogun pipeline itself and obtained empty output files in the output directories

`shogun --log debug pipeline \
-a burst \
-i /home/guest/Shotgun/shi7_output/fecal_ssms/combined_fecal_seqs.fna \
-d /home/guest/Shotgun/prebuilt_db/ \
-o /home/guest/Shotgun/shi7_output/fecal_ssms/ \
--no-function `
`04/14/2021 03:04:59 AM : DEBUG : Initiate Logger burst
04/14/2021 03:04:59 AM : DEBUG : burst15 --queries /home/guest/Shotgun/shi7_output/fecal_ssms/combined_fecal_seqs.fna --references /home/guest/Shotgun/prebuilt_db/burst/rep82.edx --output /home/guest/Shotgun/shi7_output/fecal_ssms/alignment.burst.b6 --threads 16 --mode CAPITALIST --id 0.98 --npenalize --skipambig --forwardreverse --accelerator /home/guest/Shotgun/prebuilt_db/burst/rep82.acx --taxonomy /home/guest/Shotgun/prebuilt_db/rep82.tax --taxacut 5
04/14/2021 03:05:03 AM : DEBUG : OOM:WordDump_rd
04/14/2021 03:05:03 AM : DEBUG : This is BURST [v1.0 DB 15]
04/14/2021 03:05:03 AM : DEBUG :  --> Setting threads to 16
04/14/2021 03:05:03 AM : DEBUG :  --> Setting run mode to CAPITALIST
04/14/2021 03:05:03 AM : DEBUG :  --> Setting identity threshold to 0.980000
04/14/2021 03:05:03 AM : DEBUG :  --> Setting N penalty (ref N vs query A/C/G/T)
04/14/2021 03:05:03 AM : DEBUG :  --> Skipping highly ambiguous sequences
04/14/2021 03:05:03 AM : DEBUG :  --> Also considering the reverse complement of reads
04/14/2021 03:05:03 AM : DEBUG :  --> Using accelerator file /home/guest/Shotgun/prebuilt_db/burst/rep82.acx
04/14/2021 03:05:03 AM : DEBUG :  --> Assigning taxonomy based on mapping file: /home/guest/Shotgun/prebuilt_db/rep82.tax
04/14/2021 03:05:03 AM : DEBUG :  --> Ignoring 1/5 disagreeing taxonomy calls
04/14/2021 03:05:03 AM : DEBUG : Using up to AVX-128 with 16 threads.
04/14/2021 03:05:03 AM : DEBUG :  --> [Accel] Accelerator found. Parsing...
04/14/2021 03:05:03 AM : DEBUG :  --> [Accel] Total accelerants: 39769367429 [bytes = 119308102287]
04/14/2021 03:05:03 AM : DEBUG : 3.70 seconds
04/14/2021 03:05:03 AM : DEBUG : Subprocess finished.
04/14/2021 03:05:03 AM : DEBUG : Beginning post align capitalist style with aligner burst
04/14/2021 03:05:03 AM : DEBUG : strain
04/14/2021 03:05:03 AM : DEBUG : Beginning redistribution for file: /home/guest/Shotgun/shi7_output/fecal_ssms/taxatable.burst.capitalist.txt
04/14/2021 03:05:03 AM : DEBUG : Attempting to load the database metadata file at /home/guest/Shotgun/prebuilt_db/metadata.yaml`
GabeAl commented 3 years ago

Thanks for providing the log! It looks like burst never began the actual alignment. How much RAM do you have on the system running shogun? In BURST mode, it is designed for very powerful servers with over 160GB of RAM (256 or greater is recommended).

I think this should have been made clear in the instructions, or perhaps we should make the python auto detect the amount of RAM available and warn if insufficient or revert to utree mode.

Ben, would this be possible?

Of course this may not help if it's actually not a RAM issue, but usually it is. :-)

Cheerio, Gabe

On Tue, Apr 13, 2021, 11:16 PM jurtlest @.***> wrote:

I'm facing the same error as well..

To provide some context, I've included the script and output below:

I've ran the debug pipeline as well as the shogun pipeline itself and obtained empty output files in the output directories

shogun --log debug pipeline \ -a burst \ -i /home/guest/Shotgun/shi7_output/fecal_ssms/combined_fecal_seqs.fna \ -d /home/guest/Shotgun/prebuilt_db/ \ -o /home/guest/Shotgun/shi7_output/fecal_ssms/ \ --no-function

04/14/2021 03:04:59 AM : DEBUG : Initiate Logger burst 04/14/2021 03:04:59 AM : DEBUG : burst15 --queries /home/guest/Shotgun/shi7_output/fecal_ssms/combined_fecal_seqs.fna --references /home/guest/Shotgun/prebuilt_db/burst/rep82.edx --output /home/guest/Shotgun/shi7_output/fecal_ssms/alignment.burst.b6 --threads 16 --mode CAPITALIST --id 0.98 --npenalize --skipambig --forwardreverse --accelerator /home/guest/Shotgun/prebuilt_db/burst/rep82.acx --taxonomy /home/guest/Shotgun/prebuilt_db/rep82.tax --taxacut 5 04/14/2021 03:05:03 AM : DEBUG : OOM:WordDump_rd 04/14/2021 03:05:03 AM : DEBUG : This is BURST [v1.0 DB 15] 04/14/2021 03:05:03 AM : DEBUG : --> Setting threads to 16 04/14/2021 03:05:03 AM : DEBUG : --> Setting run mode to CAPITALIST 04/14/2021 03:05:03 AM : DEBUG : --> Setting identity threshold to 0.980000 04/14/2021 03:05:03 AM : DEBUG : --> Setting N penalty (ref N vs query A/C/G/T) 04/14/2021 03:05:03 AM : DEBUG : --> Skipping highly ambiguous sequences 04/14/2021 03:05:03 AM : DEBUG : --> Also considering the reverse complement of reads 04/14/2021 03:05:03 AM : DEBUG : --> Using accelerator file /home/guest/Shotgun/prebuilt_db/burst/rep82.acx 04/14/2021 03:05:03 AM : DEBUG : --> Assigning taxonomy based on mapping file: /home/guest/Shotgun/prebuilt_db/rep82.tax 04/14/2021 03:05:03 AM : DEBUG : --> Ignoring 1/5 disagreeing taxonomy calls 04/14/2021 03:05:03 AM : DEBUG : Using up to AVX-128 with 16 threads. 04/14/2021 03:05:03 AM : DEBUG : --> [Accel] Accelerator found. Parsing... 04/14/2021 03:05:03 AM : DEBUG : --> [Accel] Total accelerants: 39769367429 [bytes = 119308102287] 04/14/2021 03:05:03 AM : DEBUG : 3.70 seconds 04/14/2021 03:05:03 AM : DEBUG : Subprocess finished. 04/14/2021 03:05:03 AM : DEBUG : Beginning post align capitalist style with aligner burst 04/14/2021 03:05:03 AM : DEBUG : strain 04/14/2021 03:05:03 AM : DEBUG : Beginning redistribution for file: /home/guest/Shotgun/shi7_output/fecal_ssms/taxatable.burst.capitalist.txt 04/14/2021 03:05:03 AM : DEBUG : Attempting to load the database metadata file at /home/guest/Shotgun/prebuilt_db/metadata.yaml

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/knights-lab/SHOGUN/issues/23#issuecomment-819198514, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5NOBU64CITEKMF7772DZ3TIUCJPANCNFSM4HVCY43Q .

Justice-Lu commented 3 years ago

Hi Gabe,

Thanks for the prompt response.

grep MemTotal /proc/meminfo
MemTotal:       57718384 kB

Based on what you've referenced about RAM availability, I'd say my current memory is way below the required memory to run BURST ??

But it also seems weird that this wouldn't trigger any warning or error from the shogun pipeline itself. For cases like this, besides somehow requesting more RAM onto this system. Is there a more immediate or accurate approach to troubleshoot for the error ??

GabeAl commented 3 years ago

Thanks for checking your RAM situation -- indeed this is too low with the supplied DB (a newer DB has been created which is actually smaller despite including more prokaryotes, but last I heard it was still working its way through the pipes).

A more direct way to check would be to run the final BURST command separately and monitor its terminal output and behavior, but this may be moot because it's certain that BURST would not be able to run that alignment on a system with 55GB of RAM.

I fully agree with you about the pipeline needing to check RAM availability explicitly -- a feature that is currently not implemented. BURST itself usually tries to produce an error code when it runs out of memory, but there are issues that influence whether this gets propagated from BURST to stderr to python before burst gets terminated by the OS (subprocess pipe closed before final error comes through). This is why I think the python wrapper (via the psutil library or similar) would be a great way to nip problems like this in the bud before even getting to any dicey low-memory situations.

@bhillmann Is this something feasible in the short term?

I could also code this into BURST directly, but that may overcomplicate implementation due to OS differences, as well as the inability of BURST to modify SHOGUN's choice of aligner and type of warning message, etc.