smith-chem-wisc / Spritz

Software for RNA-Seq analysis to create sample-specific proteoform databases from RNA-Seq data
https://smith-chem-wisc.github.io/Spritz/
MIT License
7 stars 11 forks source link

Spritz crashing after command line execution (Ubuntu). Step 10. #215

Closed MiguelCos closed 3 years ago

MiguelCos commented 3 years ago

Describing the issue Hello, I'm testing a Spritz installation on our linux server, therefore executing from the command line.

The execution shows an error somewhere between step 10 and 12

Finished job 4.
10 of 43 steps (23%) done

[Fri Jul 30 08:57:06 2021]
rule reference_protein_xml:
    input: SnpEff/data/Homo_sapiens.GRCh38/doneHomo_sapiens.GRCh38.txt, SnpEff/snpEff.jar, data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, TransferUniProtModifications/Trans             ferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll, data/uniprot/Homo_sapiens.protein.xml.gz
    output: /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/doneHomo_sapiens.GRCh38.100.txt, /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.             GRCh38.100.protein.xml, /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.xml.gz, /home/schilling/Spritz/Spritz/projects/test_spritz_default/var             iants/Homo_sapiens.GRCh38.100.protein.fasta, /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.withdecoys.fasta, /home/schilling/Spritz/Spritz/p             rojects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.withmods.xml, /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.withmods.xm             l.gz
    log: /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.spritz.log
    jobid: 31
    benchmark: /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.spritz.benchmark
    wildcards: dir=/home/schilling/Spritz/Spritz/projects/test_spritz_default
    resources: mem_mb=16000

[Fri Jul 30 09:00:20 2021]
Error in rule reference_protein_xml:
    jobid: 31
    output: /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/doneHomo_sapiens.GRCh38.100.txt, /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.             GRCh38.100.protein.xml, /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.xml.gz, /home/schilling/Spritz/Spritz/projects/test_spritz_default/var             iants/Homo_sapiens.GRCh38.100.protein.fasta, /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.withdecoys.fasta, /home/schilling/Spritz/Spritz/p             rojects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.withmods.xml, /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.withmods.xm             l.gz
    log: /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.spritz.log (check log file(s) for error message)
    shell:
        (java -Xmx16000M -jar SnpEff/snpEff.jar -v -nostats -xmlProt /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.xml Homo_sapiens.GRCh38 &             & dotnet TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll -x data/uniprot/Homo_sapiens.protein.xml.gz -y /home/schilling/Sprit             z/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.xml && gzip -k /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.             withmods.xml /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.xml) &> /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_             sapiens.GRCh38.100.spritz.log && touch /home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/doneHomo_sapiens.GRCh38.100.txt
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job reference_protein_xml since they might be corrupted:
/home/schilling/Spritz/Spritz/projects/test_spritz_default/variants/Homo_sapiens.GRCh38.100.protein.xml
2021-07-30T09:02:38 prefetch.2.10.1:  https download succeed
2021-07-30T09:02:38 prefetch.2.10.1: 1) 'SRR629563' was downloaded successfully
2021-07-30T09:02:38 prefetch.2.10.1: 'SRR629563' has 0 unresolved dependencies
[Fri Jul 30 09:02:38 2021]
Finished job 22.
11 of 43 steps (26%) done
[Fri Jul 30 09:18:23 2021]
Finished job 19.
12 of 43 steps (28%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/schilling/Spritz/Spritz/.snakemake/log/2021-07-30T085125.329144.snakemake.log
(spritz) schilling@proteomics:~/Spritz/Spritz$ Finished job 4.

I didn't find the .snanakemake/log/ folder but here I'm attaching the log files from the step of the pipeline that apparently crashed.

Log and benchmark files error_default_sra_spritz_cmd_linux.zip

Reproducing the error I am using the default settings from the config.yaml and trying to run the pipeline with the SRA SRR629563 (same as the publication).

The config.yaml that I used is also attached in the zip that I am sharing.

Expected behavior

Screenshots If applicable, pleas add screenshots to help explain the problem.

Desktop information

acesnik commented 3 years ago

Thanks for the error report!

I'm working on an overhaul (https://github.com/smith-chem-wisc/Spritz/pull/211), and I've noticed this error has come up, but I haven't figured it out just yet... It will be fixed in the next release of version 0.3.0.

Cheers,

AC

acesnik commented 3 years ago

Found the issue!

MiguelCos commented 3 years ago

Hello Anthony,

Many thanks for taking a look!

I'll be checking on updates so I can clone the corrected version and giving it a try.

Best, Miguel

acesnik commented 3 years ago

I believe this is solved in https://github.com/smith-chem-wisc/Spritz/pull/211. Please give it a try in the most recent version, and feel free to reopen if you encounter it again!

MiguelCos commented 3 years ago

Hello Anthony,

I am not sure if this would be related to the same error as before, but I am still not able to execute Spritz from the command line. It could be related to my ignorance regarding managing conda environments in Linux so I am describing my observations here:

I completely deleted the old Spritz folder and again followed the instructions (https://github.com/smith-chem-wisc/Spritz/wiki/Spritz-commandline-usage) to clone the new version from Github and create a new conda environment. I even removed the old conda environment with the same name before.

Now I am having the issue of "no Snakefile found"

(spritzbase) schilling@proteomics:~/Spritz$ snakemake -j 24 --use-conda --conda-frontend mamba --resources mem_mb=130000
Error: no Snakefile found, tried Snakefile, snakefile, workflow/Snakefile, workflow/snakefile.

One thing that I noticed is that the new version from GitHub does not have a Spritz/config.yaml file. I then placed my old one in the same location as it was in the installation before. Could this be related to the error?

This is how my file Spritz/config.yaml looks like:

sra: [SRR629563] #  paired-end SRAs, comma separated, can leave empty, e.g. SRR629563
fq: [] # paired-end fastq prefixes, comma separated, can leave empty, e.g. TestPairedEnd
sra_se: [] # single-end SRAs, comma separated, can leave empty, e.g. SRR8070095
fq_se: [] # single-end fastq prefixes, comma separated, can leave empty, e.g. TestSingleEnd
analysisDirectory: ["/home/schilling/Spritz/Spritz/projects/test_spritz_default"] # for paths to drive e.g. /mnt/c/AnalysisFolder
species: "Homo_sapiens"
genome: "GRCh38"
release: "100"
organism: "human" # based on uniprot
analyses: [isoform,variant]
spritzversion: "0.3.3" # should be the same here, common.smk, and MainWindow.xml.cs

I would really appreciate your advice.

Best wishes, Miguel

acesnik commented 3 years ago

Could you change the directory to Spritz/Spritz/workflow/ and try again? That's where the snakefile is (https://github.com/smith-chem-wisc/Spritz/tree/master/Spritz/workflow). I'll update that information in the instructions today!

MiguelCos commented 3 years ago

Now it seems to have started working! Thanks!

I don't remember having to navigate to the workflow folder the first time I tried but now I am not sure.

I'll let you know if I have any new problems with the test.

Many thanks again.

Best, Miguel

MiguelCos commented 3 years ago

Hello,

I'm still having the same error:

Activating conda environment: /home/schilling/Spritz/Spritz/workflow/.snakemake/conda/4d722d09
[Wed Aug  4 12:40:35 2021]
Error in rule reference_protein_xml:
    jobid: 2
    output: /home/schilling/Spritz/projects/def_test/variants/doneHomo_sapiens.GRCh38.97.txt, /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.xml, /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.xml.gz, /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.fasta, /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.withdecoys.fasta, /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.withmods.xml, /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.withmods.xml.gz
    log: /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.spritz.log (check log file(s) for error message)
    conda-env: /home/schilling/Spritz/Spritz/workflow/.snakemake/conda/10e291df
    shell:
        (java -Xmx16000M -jar ../resources/SnpEff/snpEff.jar -v -nostats -xmlProt /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.xml Homo_sapiens.GRCh38 && dotnet ../SpritzModifications/bin/x64/Release/net5.0/SpritzModifications.dll -x ../resources/uniprot/Homo_sapiens.protein.xml.gz -y /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.xml && gzip -k /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.withmods.xml /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.xml) &> /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.spritz.log && touch /home/schilling/Spritz/projects/def_test/variants/doneHomo_sapiens.GRCh38.97.txt
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job reference_protein_xml since they might be corrupted:
/home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.xml
[Wed Aug  4 12:41:58 2021]
Finished job 26.
10 of 25 steps (40%) done
[Wed Aug  4 13:01:10 2021]
Finished job 25.
11 of 25 steps (44%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/schilling/Spritz/Spritz/workflow/.snakemake/log/2021-08-04T122547.282928.snakemake.log

I'm uploading here the zip file with the log and benchmark files and the config.yaml that I used.

I also noticed that the pipeline was looking for the config.yaml at Spritz/Spritz/workflow/config instead of Spritz/config.yaml, which I initially assumed.

I tested changing the release number from 100 to 97, with the same result.

test_2_spritz_033.zip

Please let me know if you need any more info.

I would really appreciate your support, so many thanks in advance!

Best, Miguel

acesnik commented 3 years ago

Thanks for the additional information here. It actually looks like a different error than you were getting before, and one that I'm not seeing on my machines (see here where it's completing successfully https://github.com/smith-chem-wisc/Spritz/runs/3227302010?check_suite_focus=true#step:7:352).

I think it's actually something to do with the commandline argument parsing. Could you please try running dotnet ../SpritzModifications/bin/x64/Release/net5.0/SpritzModifications.dll -x="../resources/uniprot/Homo_sapiens.protein.xml.gz" -y="/home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.xml" to see if that works? I'm wondering if the equals and quotes will help (currently it's just using a space and then the path). It should be able to handle both formats of commandline arguments, but perhaps that's not really the case!

MiguelCos commented 3 years ago

Thanks for taking a look!

I receive this error with the new test.

(spritzbase) schilling@proteomics:~/Spritz/Spritz/workflow$ dotnet ../SpritzModifications/bin/x64/Release/net5.0/SpritzModifications.dll -x="../resources/uniprot/Homo_sapiens.protein.xml.gz" -y="/home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.xml"
Welcome to SpritzModifications!
Transfering modifications from UniProt database ../resources/uniprot/Homo_sapiens.protein.xml.gz to /home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.xml
Unhandled exception. System.IO.FileNotFoundException: Could not find file '/home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.xml'.
File name: '/home/schilling/Spritz/projects/def_test/variants/Homo_sapiens.GRCh38.97.protein.xml'
   at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
   at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
   at System.IO.FileStream.OpenHandle(FileMode mode, FileShare share, FileOptions options)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
   at UsefulProteomicsDatabases.ProteinDbLoader.GetPtmListFromProteinXml(String proteinDbLocation)
   at UsefulProteomicsDatabases.ProteinDbLoader.LoadProteinXML(String proteinDbLocation, Boolean generateTargets, DecoyType decoyType, IEnumerable`1 allKnownModifications, Boolean isContaminant, IEnumerable`1 modTypesToExclude, Dictionary`2& unknownModifications, Int32 maxThreads, Int32 maxHeterozygousVariants, Int32 minAlleleDepth)
   at SpritzModifications.SpritzModifications.TransferModifications(String sourceXmlPath, String destinationXmlPath) in /home/schilling/Spritz/Spritz/SpritzModifications/SpritzModifications.cs:line 105
   at SpritzModifications.SpritzModifications.Main(String[] args) in /home/schilling/Spritz/Spritz/SpritzModifications/SpritzModifications.cs:line 85
Aborted (core dumped)
acesnik commented 3 years ago

Oddly enough, that error does confirm what I suspected. It passes through where it was erroring out before. (The file isn't there to analyze because snakemake deleted it when the last run stopped there.)

I'll merge these changes in today and make a new release to address the issue: https://github.com/smith-chem-wisc/Spritz/pull/217.

acesnik commented 3 years ago

Just merged them. Feel free to give it another try.

You could just replace Spritz/workflow/rules/proteogenomics.smk with a copy from the updated code if you want to start where you left off.

MiguelCos commented 3 years ago

Hello Anthony,

As always, many thanks for taking a look and correcting this so quickly.

I started a new run around 24 hrs ago with two SRA accessions that are different from the one of the publication. And using the variant calling pipeline.

So far it completed successfully until step 17 so at least the error at step 10 is not there anymore.

I would let you know if a new problem comes up.

Best, Miguel

acesnik commented 3 years ago

Hi Miguel,

No problem. I'm glad I could help with it this week and that the pipeline made it past step 10. Thanks for the help in pinpointing the issues!

Yes, please let me know if you run into any others.

-AC