smith-chem-wisc / Spritz

Software for RNA-Seq analysis to create sample-specific proteoform databases from RNA-Seq data
https://smith-chem-wisc.github.io/Spritz/
MIT License
7 stars 11 forks source link

Spritz execution stopped without error messages at step 15/47 #204

Closed llniu closed 3 years ago

llniu commented 3 years ago

I'm new to this tool. I tried running it on Mac. I had below the parameters:

sra: [SRR629563] #  paired-end SRAs, comma separated, can leave empty, e.g. SRR629563
fq: [] # paired-end fastq prefixes, comma separated, can leave empty, e.g. TestPairedEnd
sra_se: [] # single-end SRAs, comma separated, can leave empty, e.g. SRR8070095
fq_se: [] # single-end fastq prefixes, comma separated, can leave empty, e.g. TestSingleEnd
analysisDirectory: [AnalysisFolder] # for paths to drive e.g. /mnt/c/AnalysisFolder
species: "Homo_sapiens"
genome: "GRCh38"
release: "100"
organism: "human" # based on uniprot
analyses: [isoform,variant]
spritzversion: "0.2.3"

The script stopped running after finishing job 13. Below is the status in terminal (has been like this for >2days):

Tool returned:
/Users/***/Documents/Projects/Spritz/Spritz/data/ensembl/Homo_sapiens.ensembl.vcf.idx
[Fri Feb 12 18:11:29 2021]
Finished job 13.
15 of 47 steps (32%) done
acesnik commented 3 years ago

Thanks for letting me know. Some of these tools do take a while to finish, but this does seem too long. Did one of the preceeding steps fail? I'd be curious to hear which one if so.

Was this run started after adding dotnet to the path?

llniu commented 3 years ago

Thanks for your response. None of the preceding steps failed. Yes, the run started after adding dotnet to the path. I can restart the program and see if the problem persists. Before I do that, should I clean up the data folders generated by the program?

acesnik commented 3 years ago

Could you confirm that you are using -j and --resources mem_mb= to specify the resources allocated to Spritz? (Point 7 mentioned here.) The default is for Snakemake not to apply a limit to the CPUs and memory allocated, so I'm wondering if the memory got maxed out in the last run for some reason.

In any case, I would recommend starting it up without cleaning any folders. Snakemake should start up where it left off.

llniu commented 3 years ago

Yes, I specified the resources - snakemake -j 8 --resources mem_mb=16000 I'm running it on my MacBook which has 32GB of memory, and I allocated half of RAM to it. Ok-I'll start it up again.

acesnik commented 3 years ago

Perfect. 👍🏼

llniu commented 3 years ago

Hi! Thank you very much for your patience. I started the program again but another error occurred.

[Tue Feb 16 14:50:22 2021]
Error in rule reference_protein_xml:
    jobid: 29
    output: AnalysisFolder/variants/doneHomo_sapiens.GRCh38.100.txt, AnalysisFolder/variants/Homo_sapiens.GRCh38.100.protein.xml, AnalysisFolder/variants/Homo_sapiens.GRCh38.100.protein.xml.gz, AnalysisFolder/variants/Homo_sapiens.GRCh38.100.protein.fasta, AnalysisFolder/variants/Homo_sapiens.GRCh38.100.protein.withdecoys.fasta, AnalysisFolder/variants/Homo_sapiens.GRCh38.100.protein.withmods.xml, AnalysisFolder/variants/Homo_sapiens.GRCh38.100.protein.withmods.xml.gz
    log: AnalysisFolder/variants/Homo_sapiens.GRCh38.100.spritz.log (check log file(s) for error message)
    shell:
        (java -Xmx1600M -jar SnpEff/snpEff.jar -v -nostats -xmlProt AnalysisFolder/variants/Homo_sapiens.GRCh38.100.protein.xml Homo_sapiens.GRCh38 && dotnet TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll -x data/uniprot/Homo_sapiens.protein.xml.gz -y AnalysisFolder/variants/Homo_sapiens.GRCh38.100.protein.xml && gzip -k AnalysisFolder/variants/Homo_sapiens.GRCh38.100.protein.withmods.xml AnalysisFolder/variants/Homo_sapiens.GRCh38.100.protein.xml) &> AnalysisFolder/variants/Homo_sapiens.GRCh38.100.spritz.log && touch AnalysisFolder/variants/doneHomo_sapiens.GRCh38.100.txt
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Looking at the log information, it didn't reveal any errors (pasted below).

00:00:00    SnpEff version SnpEff 4.3u (build 2020-06-25 22:57), by Pablo Cingolani
00:00:00    Command: 'ann'
00:00:00    Reading configuration file 'snpEff.config'. Genome: 'Homo_sapiens.GRCh38'
00:00:00    Reading config file: /Users/jpx667/Documents/Projects/Spritz/Spritz/snpEff.config
00:00:00    Reading config file: /Users/jpx667/Documents/Projects/Spritz/Spritz/SnpEff/snpEff.config
00:00:00    done
00:00:00    Reading database for genome version 'Homo_sapiens.GRCh38' from file '/Users/jpx667/Documents/Projects/Spritz/Spritz/SnpEff/./data/Homo_sapiens.GRCh38/snpEffectPredictor.bin' (this might take a while)
00:00:29    done
00:00:29    Loading Motifs and PWMs
00:00:29    Building interval forest
java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3181)
    at java.util.ArrayList.grow(ArrayList.java:267)
    at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:241)
    at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:233)
    at java.util.ArrayList.add(ArrayList.java:464)
    at org.snpeff.interval.Intron.add(Intron.java:42)
    at org.snpeff.interval.Intron.createSpliceSiteAcceptor(Intron.java:82)
    at org.snpeff.interval.Transcript.createSpliceSites(Transcript.java:743)
    at org.snpeff.interval.Genes.createSpliceSites(Genes.java:129)
    at org.snpeff.snpEffect.SnpEffectPredictor.createGenomicRegions(SnpEffectPredictor.java:206)
    at org.snpeff.snpEffect.SnpEffectPredictor.buildForest(SnpEffectPredictor.java:154)
    at org.snpeff.SnpEff.loadDb(SnpEff.java:617)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:1020)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:1003)
    at org.snpeff.SnpEff.run(SnpEff.java:1182)
    at org.snpeff.SnpEff.main(SnpEff.java:162)
00:00:48    Logging
00:01:00    Checking for updates...
00:01:05    Done.
acesnik commented 3 years ago

The log file points to an OutOfMemoryError. Could you try increasing the memory allocation to 24 GB to see if that works?

llniu commented 3 years ago

Increasing the memory allocation to 24GB worked! Thanks a lot.

I have a question unrelated to this issue if I can ask here (sorry couldn't find you email address). Does Spritz only take RNAseq data as input? Can it handle SNP array data?

acesnik commented 3 years ago

Good to hear! It only accepts RNA-Seq as input currently.

acesnik commented 3 years ago

I've updated the README to reflect that 24 GB is likely better. I think the change is due to requiring SnpEff to build the reference for the selected reference version, rather than using a prebuilt one.

Thanks! -AC