shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
56 stars 12 forks source link

Java Runtime Error when running test databases on VM #59

Closed dajsfiles closed 2 years ago

dajsfiles commented 2 years ago

Hi,

I'm currently trying to get hecatomb working on a VM, but I've run into a Java-related error. The support for said VM and storage servers told me that this error was not related to the VM, and that I should contact the hecatomb developers. I've attached the error log. Please let me know if there's any further information that needs to be provided. hs_err_pid170.log

linsalrob commented 2 years ago

It appears to me that the JVM consumed all the system memory and was killed.

Can you please check the # Run Parameters # section of hecatomb.config.yaml and make sure that you are not requesting too much memory?

Let us know if that fixes the problem.

dajsfiles commented 2 years ago

I'm not sure how much memory I should use. I've been running hecatomb from a docker which requires a specific allocated memory number that I've requested to be 16gb in most cases. Should I request in the docker itself to get more memory?

This is my Hecatomb.config.yaml file, by the way.

##################

Run Parameters

##################

Database installation location, leave blank = use hecatomb install location

Databases:

STICK TO YOUR SYSTEM'S CPU:RAM RATIO FOR THESE

BigJobMem: 64000 # Memory for MMSeqs in megabytes (e.g 64GB = 64000, recommend >= 64000) BigJobCpu: 24 # Threads for MMSeqs (recommend >= 16) BigJobTimeMin: 5760 # Max runtime in minutes for MMSeqs (this is only enforced by the Snakemake profile) MediumJobMem: 32000 # Memory for Megahit/Flye in megabytes (recommend >= 32000) MediumJobCpu: 16 # CPUs for Megahit/Flye in megabytes (recommend >= 16) SmallJobMem: 16000 # Memory for BBTools etc. in megabytes (recommend >= 16000) SmallJobCpu: 8 # CPUs for BBTools etc. (recommend >= 8)

default CPUs = 1

defaultMem: 2000 # Default memory in megabytes (for use with --profile) defaultTime: 1440 # Default time in minutes (for use with --profile) defaultJobs: 100 # Default concurrent jobs (for use with --profile)

Some jobs need more RAM; go over your CPU:RAM ratio if needed

MoreRamMem: 16000 # Memory for slightly RAM-hungry jobs in megabytes (recommend >= 16000) MoreRamCpu: 2 # CPUs for slightly RAM-hungry jobs (recommend >= 2)

beardymcjohnface commented 2 years ago

According to the log, you have 64 cores and 32 GB of RAM. Is that correct? If so, by default Hecatomb will be spinning up 3 or 4 BBTools jobs, each reserving 16GB which will put you over your system's available memory.

Change this part of the config like so:

BigJobMem: 32000 # Memory for MMSeqs in megabytes (e.g 64GB = 64000, recommend >= 64000)
BigJobCpu: 64 # Threads for MMSeqs (recommend >= 16)
BigJobTimeMin: 5760 # Max runtime in minutes for MMSeqs (this is only enforced by the Snakemake profile)
MediumJobMem: 32000 # Memory for Megahit/Flye in megabytes (recommend >= 32000)
MediumJobCpu: 64 # CPUs for Megahit/Flye in megabytes (recommend >= 16)
SmallJobMem: 16000 # Memory for BBTools etc. in megabytes (recommend >= 16000)
SmallJobCpu: 32 # CPUs for BBTools etc. (recommend >= 8)
dajsfiles commented 2 years ago

Thank you! This initially seemed to solve my problem, but after 64%, another process failed. I've attached the log here. 2022-01-25T213119.818853.snakemake.log

beardymcjohnface commented 2 years ago

It's progress at least. This is an MMSeqs error, and I can't see anything helpful related to bus errors on the mmseqs github issues page https://github.com/soedinglab/MMseqs2/issues Try rerunning it and I'll see how the newest version of MMSeqs2 works with Hecatomb.

shandley commented 2 years ago

Isn't the specific version specified in the mmseqs.yaml though? It should be as the newer versions of mmseqs (13 and above) changed almost everything about mmseqs output so we would want to make sure no other versions other than the one specific in the env are used or there will be loads of downstream parsing issues.

beardymcjohnface commented 2 years ago

Scott, the new mmseqsUpdate branch seems to be working for me on the test dataset and we should be good to migrate to the new version whenever we want. it includes a couple of bugfixes that i'll need to cherry pick into dev and master for now. In the end I only needed to tweak the AA taxonomy steps. The NT tax steps and the assembly mmseqs step worked fine (though I still need to check the assembly contig annotations to make sure they're correct). The bigtable looks fine though.

Jason, let me know if you want to try this version and need help checking out the mmseqsUpdate branch and running it.

shandley commented 2 years ago

Hi @beardymcjohnface we should really take a deeper look. When mmseqs2 updated to release 13-45111 they changed not only everything about how the algorithm works (it works primarily as a contig annotator and less well as a short read annotator) but they also changed all of the output files. The columns are not the same, I don't think I was able to sort out how to dissect the LCA results. It really wasn't an incremental version release as much as it was a release of an entirely new software package.

beardymcjohnface commented 2 years ago

I agree, I made it a separate branch so we could make a pull request, review it there and make any necessary changes before merging it with the main branch (assuming it works fine).

dajsfiles commented 2 years ago

Hi,

I've tried running it again, and this time it hit a different error. I'm not sure if these two are related. 2022-02-01T211755.954089.snakemake.log

beardymcjohnface commented 2 years ago

Hi Sorry for the late reply. This is an MMSeqs issue I think. You could try running the commands manually and see if they work, but I would also append memory limit on the search command (which I've done below). I'll patch this into the next release of Hecatomb just to be safe. If it does work, rerun Hecatomb and it should continue after this step.

mmseqs createdb \
    hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/assembly.fasta \
    hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/queryDB \
    --dbtype 2

mmseqs search \
    hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/queryDB \
    /storage1/fs1/leyao.wang/Active/jason_test/hecatomb/snakemake/workflow/../../databases/nt/virus_primary_nt/sequenceDB \
    hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/results/result \
    hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/mmseqs_nt_tmp \
    --start-sens 2 -s 7 --sens-steps 3 --min-length 90 -e 1e-5 --search-type 3 \
    --split-memory-limit 24000
dajsfiles commented 2 years ago

It says that the mmseqs command is not found. Should I install MMseqs? Could that be what's causing the issue?

beardymcjohnface commented 2 years ago

oh my bad. You could install it, or you could use the conda env that snakemake created. The easiest way is to just install mmseqs2:

# dont run from your base env, your hecatomb env should be fine
mamba install mmseqs2=12.113e3=h2d02072_2
dajsfiles commented 2 years ago

Hi Michael,

When I ran the aforementioned commands, I got another issue: there's a file in hecatomb that couldn't be opened for writting. 2-10-2022-output.txt

beardymcjohnface commented 2 years ago

I'm not sure why MMSeqs is failing here. you could try deleting the mmseqs directories hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/mmseqs_nt_tmp and hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/results and rerunning hecatomb. Otherwise we'll have to pester the MMSeqs developers for some ideas.

If you're not worried about the contig annotations you can rerun hecatomb and add the option --snake=-k. The pipeline will still "fail" but it should create everything except these files (the assembly, seqtable, bigtable, read-based contig annotations etc).

dajsfiles commented 2 years ago

I was able to delete mmseqs_nt_tmp but it looks like results didn't exist in the first place.

When I ran hecatomb again, it exited almost instantly. This was the error log: 2022-02-11T202054.110585.snakemake.log

dajsfiles commented 2 years ago

Also, we do need contig annotations.

beardymcjohnface commented 2 years ago

I would suggest deleting the hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/ directory and making the pipeline regenerate those files; something has been corrupted at some point I think. You should also include the snakemake 'keep going' flag by adding --snake=-k to the end of your hecatomb command. That should hopefully make the pipeline finish the read annotations if nothing else.

dajsfiles commented 2 years ago

Ok! Would the pipeline regenerate them if I simply ran '''hecatomb run --test --snake=-k'''?

beardymcjohnface commented 2 years ago

Yes, any files that are missing should be regenerated, as well as any subsequent files that depend on them. I'm just looking back through the thread; is this the test dataset that is failing?

dajsfiles commented 2 years ago

Yes.

dajsfiles commented 2 years ago

After trying to regenerate them, I tried to run it again but it still keeps hitting errors. When I tried to regenerate again after deleting config_dictionary, I then was greeted with this message:

(/storage1/fs1/leyao.wang/Active/jason_test/hecatomb) j.m.li@compute1-exec-132:~$ hecatomb run --test --snake=-k Config file hecatomb.config.yaml already exists. Running Hecatomb Running snakemake command: snakemake -j 32 --use-conda --conda-frontend mamba --rerun-incomplete --printshellcmds --nolock --show-failed-logs --conda-prefix /storage1/fs1/leyao.wang/Active/jason_test/hecatomb/snakemake/workflow/conda --configfile hecatomb.config.yaml -k -s /storage1/fs1/leyao.wang/Active/jason_test/hecatomb/snakemake/workflow/Hecatomb.smk -C Reads=/storage1/fs1/leyao.wang/Active/jason_test/hecatomb/test_data Host=human Output=hecatomb_out SkipAssembly=False Fast=False Report=False Building DAG of jobs... WorkflowError: Unable to obtain modification time of file hecatomb_out/RESULTS/assembly.fasta although it existed before. It could be that a concurrent process has deleted it while Snakemake was running. File "/storage1/fs1/leyao.wang/Active/jason_test/hecatomb/lib/python3.10/asyncio/runners.py", line 44, in run File "/storage1/fs1/leyao.wang/Active/jason_test/hecatomb/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete

I've also attached my error file from the normal run. 2022-02-17T210726.314687.snakemake.log

beardymcjohnface commented 2 years ago

That modification time error can occur during reruns of failed/killed snakemake pipelines. I think you can just touch the file and it should be ok. Alternatively, you should be able to delete the .snakemake/ directory.

The normal run error is back to the mmseqs running out of memory. I don't think we actually got running the mmseqs commands manually to run did we?

The fix for the memory issue is here: https://github.com/shandley/hecatomb/commit/14625b0bb987866676da0658d505efc1e00b8b85 You just need to add --split-memory-limit {MMSeqsMemSplit} to the mmseqs command in 03_contig_annotation.smk rules file for the mmseqs_contig_annotation rule. Your file should be in /storage1/fs1/leyao.wang/Active/jason_test/hecatomb/snakemake/workflow/rules/03_contig_annotation.smk. Otherwise, we could try and install the github version and checkout the dev branch, or wait for the next release.

dajsfiles commented 2 years ago

Regarding the modification time error, do you mean I should open and close the 2 python files mentioned?

For mmseqs, there's multiple categories for the annotation rule. Which one should I put the command under? For reference, this is what the file lists for rule mmseqs_contig_annotation:

rule mmseqs_contig_annotation:
    """Contig annotation step 01: Assign taxonomy to contigs in contig_dictionary using mmseqs

    Database: NCBI virus assembly with taxID added
    """
    input:
        contigs=os.path.join(ASSEMBLY,"CONTIG_DICTIONARY","FLYE","assembly.fasta"),
        db=os.path.join(NCBIVIRDB, "sequenceDB")
    output:
        queryDB=os.path.join(ASSEMBLY,"CONTIG_DICTIONARY","FLYE","queryDB"),
        result=os.path.join(ASSEMBLY,"CONTIG_DICTIONARY","FLYE","results","result.index")
    params:
        respath=os.path.join(ASSEMBLY,"CONTIG_DICTIONARY","FLYE","results","result"),
        tmppath=os.path.join(ASSEMBLY,"CONTIG_DICTIONARY","FLYE","mmseqs_nt_tmp")
    benchmark:
        os.path.join(BENCH, "mmseqs_contig_annotation.txt")
    log:
        os.path.join(STDERR, "mmseqs_contig_annotation.log")
    resources:
        mem_mb=MMSeqsMem
    threads:
        MMSeqsCPU
    conda:
        os.path.join("../", "envs", "mmseqs2.yaml")
    shell:
        """
        {{
        mmseqs createdb {input.contigs} {output.queryDB} --dbtype 2;
        mmseqs search {output.queryDB} {input.db} {params.respath} {params.tmppath} \
            {MMSeqsSensNT} {config[filtNTsecondary]} \
            --search-type 3 ; }} &> {log}
        rm {log}
        """
beardymcjohnface commented 2 years ago

You can use the touch command to update the timestamps of the files, which is what Snakemake uses to keep track of what it does and does not need to do. If you open the commit link -> https://github.com/shandley/hecatomb/commit/14625b0bb987866676da0658d505efc1e00b8b85 you can make the same changes in your file.

dajsfiles commented 2 years ago

That's strange. When I try to touch "hecatomb_out/RESULTS/assembly.fasta" it claims the file/directory does not exist, even when I cd into RESULTS. In the same folder, if I type the ls command, assembly.fasta shows up. But touching it does not work. image image

dajsfiles commented 2 years ago

I've been able to update the modification time of the symlink, but even then, I'm still encountering this error.

dajsfiles commented 2 years ago

Ok, that's weird. I ran it after changing the name of the file I was supposed to touch so that hecatomb wouldn't be able to find it and it completed a test run successfully.

dajsfiles commented 2 years ago

Now that it successfully completed, do I have to run it again with any other modifications, or is it good to use?

dajsfiles commented 2 years ago

There was a problem when I was running actual datasets, but I don't know if these are issues with hecatomb itself or with the data sets. I've attached all 3 error logs.

The main issue referred to something as "Invalid header line: must start with @HD/@SQ/@RG/@PG/@CO".

Did my version of hecatomb become corrupted due to numerous failed runs? 2022-03-08T220712.552501.snakemake.log 2022-03-08T222137.634342.snakemake.log hecatomb.crashreport.log

beardymcjohnface commented 2 years ago

The error here is with samtools view in host_removal_mapping. minimap maps the reads to the host genome, samtools view will filter the mapped reads, and samtools fastq will convert the bam format back to fastq. I'm not sure why the header isn't being passed by minimap.

Can you run ls -lh hecatomb_out/PROCESSING/TMP/p06/ to make sure the input fastq files aren't empty?

Then you could run

minimap2 -ax sr -t 8 --secondary=no \
/storage1/fs1/leyao.wang/Active/jason_test/hecatomb/snakemake/workflow/../../databases/host/human/masked_ref.fa.gz.idx \
hecatomb_out/PROCESSING/TMP/p06/M667_I8470_32876_Wang_Asthma_A07_NEBNext_Index_5_ACAGTGATCT_S5_L001_R1.s6.out.fastq \
hecatomb_out/PROCESSING/TMP/p06/M667_I8470_32876_Wang_Asthma_A07_NEBNext_Index_5_ACAGTGATCT_S5_L001_R2.s6.out.fastq \
> A07.minimap.test.sam

to see if minimap is outputting any alignments.

dajsfiles commented 2 years ago

The first command returns an error: "ls: cannot access 'hecatomb_out/PROCESSING/TMP/p06/': Operation not permitted" The second command returns the error "bash: minimap2: command not found"

beardymcjohnface commented 2 years ago

you might need to create and spin up an environment with minimap2 conda create -n minimap2 -c bioconda minimap2 && conda activate minimap2 does the hecatomb_out/PROCESSING/TMP/p06/ directory exist?

dajsfiles commented 2 years ago

The directory exists. I'll back to you on the minimap issue.

dajsfiles commented 2 years ago

Also, should I be running the program while inside of the hecatomb folder or would it be ok if I just cd-ed to the folder of the inputs and then ran it?

beardymcjohnface commented 2 years ago

run it in a clean folder. When I'm running it, I'll create a new folder someAnalysis and a sub folder for the reads someAnalysis/reads. I would copy or link the reads to the reads folder, then cd to someAnalysis and run hecatomb from there. Don't run it from the hecatomb installation folder.

dajsfiles commented 2 years ago

I moved to a clean folder, ran it, got an error, then proceeded to run the minimap command and then run it again. Unfortunately, it looks like I'm still hitting errors. Here's what I got: 2022-03-10T232525.690659.snakemake.log

Was I supposed to cd into hecatomb_out/PROCESSING/TMP/p06/ and then run the command? Because if I do that I get an error: "ERROR: failed to open file 'hecatomb_out/PROCESSING/TMP/p06/M667_I8470_32876_Wang_Asthma_A07_NEBNext_Index_5_ACAGTGATCT_S5_L001_R1.s6.out.fastq'"

Thanks!

dajsfiles commented 2 years ago

Here's the file that's unable to be opened. M667_I8470_32876_Wang_Asthma_A07_NEBNext_Index_5_ACAGTGATCT_S5_L001.s6.stats.zip

beardymcjohnface commented 2 years ago

That looks like the same error as before. I'm guessing that sample doesn't have any reads following QC and host removal. I'll have to add an update to check for this. Is this work urgent; do you want me to try and run your samples for you?

dajsfiles commented 2 years ago

That would be great, thanks! However, the samples, after being zipped, is still about 4GB. How should I send it to you?

beardymcjohnface commented 2 years ago

Thanks for the email. The dataset ran fine on our system using the current conda version of hecatomb. I wish I knew why it was causing so much grief, but we'll probably have to test Hecatomb in some cloud VMs at some point.

dajsfiles commented 2 years ago

Ok, thank you.

Do you know what the error message

Logfile hecatomb_out/STDERR/host_removal_mapping.M667_I8470_32876_Wang_Asthma_A07_NEBNext_Index_5_ACAGTGATCT_S5_L001.samtoolsView.log:
[E::sam_hdr_create] Invalid header line: must start with @HD/@SQ/@RG/@PG/@CO
[main_samview] fail to read the header from "-".

Logfile hecatomb_out/STDERR/host_removal_mapping.M667_I8470_32876_Wang_Asthma_A07_NEBNext_Index_5_ACAGTGATCT_S5_L001.samtoolsFastq.log:
Failed to read header for "-"

is referring to? Since this seems to be a local problem.

beardymcjohnface commented 2 years ago

cont'd via email.