How to run Spritz on HPC environment and how to analyze single End RNAseqs

Jokendo-collab commented 3 years ago

Hi,

I have 300 RNAseq raw files and I would like to use this pipeline to create a sample specific database for the downstream proteogenomics analysis. Could you help me with the commands which I can use to run this pipeline on the HPC environment?
How can I analyze single End reads using this pipeline because apparently the software only takes in the paired End reads raw files. Could you comment on how these single end reads can be analysed?

acesnik commented 3 years ago

Single-end reads aren't currently enabled, but that's the next feature on the list to expand Spritz's functionality: https://github.com/smith-chem-wisc/Spritz/issues/191

First, you would need to adapt the config.yaml file to have the names of the FASTQ files. Then, the HPC environment would either need to 1) have Docker set up to use the docker image (https://hub.docker.com/r/smithlab/spritz/tags?page=1&ordering=last_updated) and then run it with a command similar to the one used by the GUI that gives access to the config file and the data files or 2) have set up conda and dotnet (2.2 is used in the pipeline currently) in the path, and then run the snakemake script as discussed in the README. I've only tested the second option, so I'd recommend going that route right now.

I'll keep you updated about the the single-end workflow progress.

Jokendo-collab commented 3 years ago

Hi, I will have to wait for the pipeline which takes the singleEnd reads because my files are all single End reads. Our HPC only supports singularity and the reason they do not support docker is for the security reasons. This means that I will not be able to run it over there

trishorts commented 3 years ago

in other projects, we have placed our docker container into a singularity. that works fine and then it won't be a security problem

Jokendo-collab commented 3 years ago

I will be glad to use this pipeline once it has been updated to accept the single End RNAseq data. If you guys already have the singularity container then I can easily pull it and use it in our cluster. I am eagerly waiting to hear from you guys regarding the update of the pipeline.

acesnik commented 3 years ago

Can you set up conda environements in your computing environment? That will work, too, without having to nest Docker in a singularity container.

I just opened a pull request for using single-end sequencing data and other things that make it easier to use on computational clusters. https://github.com/smith-chem-wisc/Spritz/pull/196

I'll be doing some manual testing but hope to have it merged by next week.

Jokendo-collab commented 3 years ago

@acesnik Yes I can set up a conda environment in our HPC server. You will let me know when you have tested it successfully.

acesnik commented 3 years ago

Okay, good to know. That makes this easier. Yep, I'll keep you posted!

acesnik commented 3 years ago

Hi Javan,

I just merged in the ability to use single-end data, mixed input data, and improved the ease of commandline usage. Give it a shot and let me know what you think.

I also updated the documentation for commandline usage in the README here: https://github.com/smith-chem-wisc/Spritz/blob/master/README.md#running-spritz-with-commandline, and I added a wiki page on how to adapt the Spritz/config.yaml file for running the workflow on commandline here: https://github.com/smith-chem-wisc/Spritz/wiki/Adapting-the-config.yaml-file-for-running-Spritz-on-the-commandline.

Please give it a try and let me know if you have any feedback on the tool or any ideas on how to improve the documentation.

Cheers,

Anthony

Jokendo-collab commented 3 years ago

I am wondering if you also get the below error when you try to activate your conda environment @acesnik Screenshot from 2020-11-15 19-47-19

Jokendo-collab commented 3 years ago

Hi Javan,

I just merged in the ability to use single-end data, mixed input data, and improved the ease of commandline usage. Give it a shot and let me know what you think.

I also updated the documentation for commandline usage in the README here: https://github.com/smith-chem-wisc/Spritz/blob/master/README.md#running-spritz-with-commandline, and I added a wiki page on how to adapt the Spritz/config.yaml file for running the workflow on commandline here: https://github.com/smith-chem-wisc/Spritz/wiki/Adapting-the-config.yaml-file-for-running-Spritz-on-the-commandline.

Please give it a try and let me know if you have any feedback on the tool or any ideas on how to improve the documentation.

Cheers,

Anthony I am getting this prompt

This is how my config.yaml file looks like. Show I comment the other lines which are not in use? Or the input files should be unzipped fastq files? Screenshot from 2020-11-15 20-09-59

Jokendo-collab commented 3 years ago

Could you explain what could be causing this error? I have followed your wiki and my conda environment is now sorted and its working but the spritz is still throwing the error.

Screenshot from 2020-11-16 10-07-39

When I used the example SRA in your wiki I get the following error: Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 64 Rules claiming more threads will be scaled down. Provided resources: mem_mb=1500 Job counts: count jobs 1 LongOrfs 1 Predict 1 all 1 assemble_transcripts_sra 1 base_recalibration 1 blastp 1 call_gvcf_varaints 1 call_vcf_variants 1 cdna_alignment_orf_to_genome_orf 1 copy_gff3_to_snpeff 1 custom_protein_xml 1 dict_fa 1 download_dbsnp_vcf 1 download_ensembl_references 1 download_sras 1 fastp_sra 1 final_vcf_naming 1 finish_isoform 1 finish_isoform_variants 1 finish_variants 1 generate_reference_snpeff_database 1 generate_snpeff_database 1 gtf_file_to_cDNA_seqs 1 gtf_to_alignment_gff3 1 hisat2_align_bam_sra 1 hisat2_groupmark_bam 1 hisat2_merge_bams 1 hisat2_splice_sites 1 hisat_genome 1 index_ensembl_vcf 1 index_fa 1 makeblastdb 1 merge_transcripts 1 reference_protein_xml 1 remove_exon_and_utr_information 1 reorder_genome_fasta 1 split_n_cigar_reads 1 transfer_modifications_isoformvariant 1 transfer_modifications_variant 1 variant_annotation_custom 1 variant_annotation_ref 41

[Mon Nov 16 12:04:40 2020] rule download_ensembl_references: output: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa, data/ensembl/Homo_sapiens.GRCh38.100.gff3, data/ensembl/Homo_sapiens.GRCh38.pep.all.fa log: data/ensembl/downloads.log jobid: 6 benchmark: data/ensembl/downloads.benchmark

[Mon Nov 16 12:04:40 2020] rule makeblastdb: input: data/uniprot/Homo_sapiens.protein.fasta output: data/uniprot/Homo_sapiens.protein.fasta.pin, data/uniprot/Homo_sapiens.protein.fasta.phr, data/uniprot/Homo_sapiens.protein.fasta.psq log: data/uniprot/Homo_sapiens.proteinmakeblastdb.log jobid: 42 benchmark: data/uniprot/Homo_sapiens.proteinmakeblastdb.benchmark

[Mon Nov 16 12:04:40 2020] Error in rule download_ensembl_references: jobid: 6 output: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa, data/ensembl/Homo_sapiens.GRCh38.100.gff3, data/ensembl/Homo_sapiens.GRCh38.pep.all.fa log: data/ensembl/downloads.log (check log file(s) for error message) shell: ((wget -O - http://ftp.ensembl.org/pub/release-100//fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz || wget -O - http://ftp.ensembl.org/pub/release-100//fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz) | gunzip -c - > data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa && wget -O - http://ftp.ensembl.org/pub/release-100/gff3/homo_sapiens/Homo_sapiens.GRCh38.100.gff3.gz | gunzip -c - > data/ensembl/Homo_sapiens.GRCh38.100.gff3 && wget -O - http://ftp.ensembl.org/pub/release-100//fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz | gunzip -c - > data/ensembl/Homo_sapiens.GRCh38.pep.all.fa) 2> data/ensembl/downloads.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Mon Nov 16 12:04:40 2020] rule download_sras: output: /scratch/suereta/Spritz/Spritz/temp/SRX980741_1.fastq, /scratch/suereta/Spritz/Spritz/temp/SRX980741_2.fastq log: /scratch/suereta/Spritz/Spritz/temp/SRX980741.log jobid: 21 benchmark: /scratch/suereta/Spritz/Spritz/temp/SRX980741.benchmark wildcards: dir=/scratch/suereta/Spritz/Spritz/temp, sra=SRX980741 threads: 4

[Mon Nov 16 12:04:40 2020] Error in rule makeblastdb: jobid: 42 output: data/uniprot/Homo_sapiens.protein.fasta.pin, data/uniprot/Homo_sapiens.protein.fasta.phr, data/uniprot/Homo_sapiens.protein.fasta.psq log: data/uniprot/Homo_sapiens.proteinmakeblastdb.log (check log file(s) for error message) shell: makeblastdb -in data/uniprot/Homo_sapiens.protein.fasta -dbtype prot 2> data/uniprot/Homo_sapiens.proteinmakeblastdb.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Mon Nov 16 12:04:40 2020] rule download_dbsnp_vcf: input: ChromosomeMappings/GRCh38_UCSC2ensembl.txt output: data/ensembl/Homo_sapiens.ensembl.vcf log: data/ensembl/downloads_dbsnp_vcf.log jobid: 11 benchmark: data/ensembl/downloads_dbsnp_vcf.benchmark

[Mon Nov 16 12:04:40 2020] Error in rule download_sras: jobid: 21 output: /scratch/suereta/Spritz/Spritz/temp/SRX980741_1.fastq, /scratch/suereta/Spritz/Spritz/temp/SRX980741_2.fastq log: /scratch/suereta/Spritz/Spritz/temp/SRX980741.log (check log file(s) for error message) shell: fasterq-dump -b 10MB -c 100MB -m 1000MB -p --threads 4 --split-files --temp /scratch/suereta/Spritz/Spritz/temp --outdir /scratch/suereta/Spritz/Spritz/temp SRX980741 2> /scratch/suereta/Spritz/Spritz/temp/SRX980741.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Mon Nov 16 12:04:40 2020] Error in rule download_dbsnp_vcf: jobid: 11 output: data/ensembl/Homo_sapiens.ensembl.vcf log: data/ensembl/downloads_dbsnp_vcf.log (check log file(s) for error message) shell: (wget -O - https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/common_all_20180418.vcf.gz | zcat - | python scripts/convert_ucsc2ensembl.py > data/ensembl/Homo_sapiens.ensembl.vcf) 2> data/ensembl/downloads_dbsnp_vcf.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /scratch/suereta/Spritz/Spritz/.snakemake/log/2020-11-16T120439.132222.snakemake.log

acesnik commented 3 years ago

Hi Javan,

The first issue looked like you needed to initialize conda with conda init, but it looks like you did in the follow-up questions.

The command you are using snakemake -j 16 --resources mem_mb=64 only allocates 64 MB to the pipeline. You may want to allocate 64000 MB, i.e. 64 GB, with snakemake -j 16 --resources mem_mb=64000

As a sidenote, Spritz includes a trimming step using fastp, and it looks like you are using trimmed reads. I'm not sure how trimming twice will affect results, but it's probably okay.

The errors you are seeing at the end are saying that the pipeline cannot find the fastq files. This is because you left the placeholder TestPairedEnd in the fq field of the config file, and because the fq_se field has filenames, rather than file prefixes. Currently, Spritz looks for anything in fq_se with "_1.fastq" appended to the end.

For example, you could consider changing the configuration to:

sra: []
sra_se: []
fq: []
fq_se: [T004-B1BALCells-PB-miRNA-S1-R1_001.merged_trimmed]
...

And then filename should be adjusted so that it reads T004-B1BALCells-PB-miRNA-S1-R1_001.merged_trimmed_1.fastq

Hope that helps!

Jokendo-collab commented 3 years ago

I corrected the file names and that is now sorted. I get a different error now Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 64 Rules claiming more threads will be scaled down. Provided resources: mem_mb=64000 Job counts: count jobs 1 LongOrfs 1 Predict 1 all 2 assemble_transcripts_fq_se 1 base_recalibration 1 blastp 1 build_transfer_mods 1 call_gvcf_varaints 1 call_vcf_variants 1 cdna_alignment_orf_to_genome_orf 1 copy_gff3_to_snpeff 1 custom_protein_xml 1 dict_fa 1 download_dbsnp_vcf 1 download_ensembl_references 1 download_snpeff 1 final_vcf_naming 1 finish_isoform 1 finish_isoform_variants 1 finish_variants 1 generate_reference_snpeff_database 1 generate_snpeff_database 1 gtf_file_to_cDNA_seqs 1 gtf_to_alignment_gff3 2 hisat2_align_bam_fq_se 1 hisat2_groupmark_bam 1 hisat2_merge_bams 1 hisat2_splice_sites 1 hisat_genome 1 index_ensembl_vcf 1 index_fa 1 makeblastdb 1 merge_transcripts 1 reference_protein_xml 1 remove_exon_and_utr_information 1 reorder_genome_fasta 1 split_n_cigar_reads 1 tmpdir 1 transfer_modifications_isoformvariant 1 transfer_modifications_variant 1 variant_annotation_custom 1 variant_annotation_ref 44

[Mon Nov 16 14:56:44 2020] rule download_ensembl_references: output: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa, data/ensembl/Homo_sapiens.GRCh38.100.gff3, data/ensembl/Homo_sapiens.GRCh38.pep.all.fa log: data/ensembl/downloads.log jobid: 6 benchmark: data/ensembl/downloads.benchmark

[Mon Nov 16 14:56:44 2020] rule makeblastdb: input: data/uniprot/Homo_sapiens.protein.fasta output: data/uniprot/Homo_sapiens.protein.fasta.pin, data/uniprot/Homo_sapiens.protein.fasta.phr, data/uniprot/Homo_sapiens.protein.fasta.psq log: data/uniprot/Homo_sapiens.proteinmakeblastdb.log jobid: 44 benchmark: data/uniprot/Homo_sapiens.proteinmakeblastdb.benchmark

[Mon Nov 16 14:56:44 2020] Error in rule download_ensembl_references: jobid: 6 output: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa, data/ensembl/Homo_sapiens.GRCh38.100.gff3, data/ensembl/Homo_sapiens.GRCh38.pep.all.fa log: data/ensembl/downloads.log (check log file(s) for error message) shell: ((wget -O - http://ftp.ensembl.org/pub/release-100//fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz || wget -O - http://ftp.ensembl.org/pub/release-100//fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz) | gunzip -c - > data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa && wget -O - http://ftp.ensembl.org/pub/release-100/gff3/homo_sapiens/Homo_sapiens.GRCh38.100.gff3.gz | gunzip -c - > data/ensembl/Homo_sapiens.GRCh38.100.gff3 && wget -O - http://ftp.ensembl.org/pub/release-100//fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz | gunzip -c - > data/ensembl/Homo_sapiens.GRCh38.pep.all.fa) 2> data/ensembl/downloads.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Mon Nov 16 14:56:44 2020] rule download_snpeff: output: SnpEff/snpEff.config, SnpEff/snpEff.jar, SnpEff_4.3_SmithChemWisc_v2.zip log: data/SnpEffInstall.log jobid: 5

[Mon Nov 16 14:56:44 2020] Error in rule makeblastdb: jobid: 44 output: data/uniprot/Homo_sapiens.protein.fasta.pin, data/uniprot/Homo_sapiens.protein.fasta.phr, data/uniprot/Homo_sapiens.protein.fasta.psq log: data/uniprot/Homo_sapiens.proteinmakeblastdb.log (check log file(s) for error message) shell: makeblastdb -in data/uniprot/Homo_sapiens.protein.fasta -dbtype prot 2> data/uniprot/Homo_sapiens.proteinmakeblastdb.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Mon Nov 16 14:56:44 2020] rule download_dbsnp_vcf: input: ChromosomeMappings/GRCh38_UCSC2ensembl.txt output: data/ensembl/Homo_sapiens.ensembl.vcf log: data/ensembl/downloads_dbsnp_vcf.log jobid: 11 benchmark: data/ensembl/downloads_dbsnp_vcf.benchmark

[Mon Nov 16 14:56:44 2020] rule tmpdir: output: tmp, temporary log: data/tmpdir.log jobid: 24

[Mon Nov 16 14:56:44 2020] Error in rule download_dbsnp_vcf: jobid: 11 output: data/ensembl/Homo_sapiens.ensembl.vcf log: data/ensembl/downloads_dbsnp_vcf.log (check log file(s) for error message) shell: (wget -O - https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/common_all_20180418.vcf.gz | zcat - | python scripts/convert_ucsc2ensembl.py > data/ensembl/Homo_sapiens.ensembl.vcf) 2> data/ensembl/downloads_dbsnp_vcf.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Mon Nov 16 14:56:44 2020] rule build_transfer_mods: output: TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll log: data/TransferUniProtModifications.build.log jobid: 28

Removing temporary output file temporary. [Mon Nov 16 14:56:44 2020] Finished job 24. 1 of 44 steps (2%) done [Mon Nov 16 14:56:47 2020] Error in rule build_transfer_mods: jobid: 28 output: TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll log: data/TransferUniProtModifications.build.log (check log file(s) for error message) shell: (cd TransferUniProtModifications && dotnet restore && dotnet build -c Release TransferUniProtModifications.sln) &> data/TransferUniProtModifications.build.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing temporary output file SnpEff_4.3_SmithChemWisc_v2.zip. [Mon Nov 16 14:57:38 2020] Finished job 5. 2 of 44 steps (5%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /scratch/suereta/Spritz/Spritz/.snakemake/log/2020-11-16T145643.039879.snakemake.log

acesnik commented 3 years ago

I'm not quite sure what to make of the errors. Could you please share these log files?

data/ensembl/downloads.log
data/uniprot/Homo_sapiens.proteinmakeblastdb.log
data/SnpEffInstall.log
data/ensembl/downloads_dbsnp_vcf.log
data/TransferUniProtModifications.build.log

Jokendo-collab commented 3 years ago

I'm not quite sure what to make of the errors. Could you please share these log files?
data/ensembl/downloads.log
data/uniprot/Homo_sapiens.proteinmakeblastdb.log
data/SnpEffInstall.log
data/ensembl/downloads_dbsnp_vcf.log
data/TransferUniProtModifications.build.log
SnpEffInstall.log TransferUniProtModifications.build.log

Jokendo-collab commented 3 years ago

Here are the log files logs.zip

acesnik commented 3 years ago

Thank you for sharing the log files. I can't see what the issue is from them, but my best guess is that it is an issue with downloading the reference files. Are you able to ping these servers?

http://www.uniprot.org
https://api.nuget.org
http://ftp.ensembl.org
https://ftp.ncbi.nih.gov/
https://github.com/

You can test them by running ping http://www.uniprot.org and such.

If you are not able to access all of them, you may need to set up Spritz in a place where you can access those websites, and then transfer that Spritz folder to the analysis server. I'll work on making a wiki page to explain how to do that.

acesnik commented 3 years ago

I just made some changes (https://github.com/smith-chem-wisc/Spritz/pull/197) to Spritz to try to make setup for restricted-access servers easier, in case that's what you're working with. I also wrote a section on how to do an external setup for the commandline usage, which you can find here: https://github.com/smith-chem-wisc/Spritz/wiki/Spritz-commandline-usage. This is needed if you do not have access to the URLs needed to set up Spritz on your analysis machine.

Briefly, you will need to download and set up Spritz on a machine with more full internet access. I made a new file environments/setup.yaml with a smaller conda environment for the setup. Then, you can bundle and compress the Spritz folder and transfer it to your analysis machine.

Clone Spritz with git clone https://github.com/smith-chem-wisc/Spritz.git; cd Spritz/Spritz

Create a conda environment for setting up spritz by running conda env create --name spritzsetup --file environments/setup.yaml; conda activate spritzsetup.

(you can skip the step for the config.yaml setup here)

Run snakemake -j 8 data/setup.txt to set up Spritz

Run cd ../.. to exit the Spritz folder.

Bundle and compress Spritz with tar cvzf Spritz.tar.gz Spritz

Copy Spritz.tar.gz to the server for your analysis

Uncompress Spritz on the analysis server with tar xvzf Spritz.tar.gz

acesnik commented 3 years ago

Thanks for your patience and feedback here, by the way! We appreciate that you're interested in getting Spritz running. I think you're the first one to use the commandline version in this context, and this is helping us to improve Spritz.

Jokendo-collab commented 3 years ago

@acesnik thanks for this. I will give it a trial and get back to you if there is a problem. I hope you will not close this issue

acesnik commented 3 years ago

Sounds good!

Jokendo-collab commented 3 years ago

Hi @acesnik

Could you do the set up and upload that file for me somewhere to download? I have tried this and still it's not working. You can push the file here: https://uwmadison.app.box.com/s/pqls1apvk5780zgnuxp6nit2x2jplqpp and I will download it.

Jokendo-collab commented 3 years ago

I prepared the data on a different computer then moved it over to the server and I am now getting the following error:

Building DAG of jobs... Using shell: /bin/bash Provided cores: 7 Rules claiming more threads will be scaled down. Provided resources: mem_mb=50000 Conda environments: ignored Job counts: count jobs 1 LongOrfs 1 Predict 1 all 1 assemble_transcripts_fq_se 1 base_recalibration 1 blastp 1 call_gvcf_varaints 1 call_vcf_variants 1 cdna_alignment_orf_to_genome_orf 1 copy_gff3_to_snpeff 1 custom_protein_xml 1 dict_fa 1 fastp_fq_se 1 final_vcf_naming 1 finish_isoform 1 finish_isoform_variants 1 finish_variants 1 generate_reference_snpeff_database 1 generate_snpeff_database 1 gtf_file_to_cDNA_seqs 1 gtf_to_alignment_gff3 1 hisat2_align_bam_fq_se 1 hisat2_groupmark_bam 1 hisat2_merge_bams 1 hisat2_splice_sites 1 hisat_genome 1 index_ensembl_vcf 1 index_fa 1 makeblastdb 1 merge_transcripts 1 reference_protein_xml 1 remove_exon_and_utr_information 1 reorder_genome_fasta 1 split_n_cigar_reads 1 transfer_modifications_isoformvariant 1 transfer_modifications_variant 1 variant_annotation_custom 1 variant_annotation_ref 38

[Wed Nov 18 21:10:11 2020] rule fastp_fq_se: input: /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1_1.fastq.gz output: /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim_1.fastq.gz, /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.html, /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.json log: /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.log jobid: 20 wildcards: dir=/home/jokendo/Spritz/Spritz/sptemp, fq_se=T016-B2BCGBAL.merged_1 threads: 6

[Wed Nov 18 21:10:11 2020] rule index_ensembl_vcf: input: data/ensembl/Homo_sapiens.ensembl.vcf output: data/ensembl/Homo_sapiens.ensembl.vcf.idx log: data/ensembl/Homo_sapiens.ensembl.vcf.idx.log jobid: 13

[Wed Nov 18 21:10:11 2020] Error in rule fastp_fq_se: jobid: 20 output: /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim_1.fastq.gz, /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.html, /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.json log: /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.log (check log file(s) for error message) shell: fastp -q 20 -i /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1_1.fastq.gz -o /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim_1.fastq.gz -h /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.html -j /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.json -w 6 -R T016-B2BCGBAL.merged_1 --detect_adapter_for_pe &> /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Wed Nov 18 21:10:11 2020] Error in rule index_ensembl_vcf: jobid: 13 output: data/ensembl/Homo_sapiens.ensembl.vcf.idx log: data/ensembl/Homo_sapiens.ensembl.vcf.idx.log (check log file(s) for error message) shell: gatk IndexFeatureFile -I data/ensembl/Homo_sapiens.ensembl.vcf 2> data/ensembl/Homo_sapiens.ensembl.vcf.idx.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/jokendo/Spritz/Spritz/.snakemake/log/2020-11-18T211011.685762.snakemake.log

Jokendo-collab commented 3 years ago

The following is the content of the data directory and the data directory is 12GB in size . ├── ensembl │ ├── downloads.benchmark │ ├── downloads_dbsnp_vcf.benchmark │ ├── downloads_dbsnp_vcf.log │ ├── downloads.log │ ├── Homo_sapiens.ensembl.vcf │ ├── Homo_sapiens.GRCh38.100.gff3 │ ├── Homo_sapiens.GRCh38.dna.primary_assembly.fa │ └── Homo_sapiens.GRCh38.pep.all.fa ├── Homo_sapiens.protein.fasta ├── Homo_sapiens.protein.xml.gz ├── Homo_sapiens.protein.xml.gz.log ├── setup.log ├── setup.txt └── uniprot ├── Homo_sapiens.protein.fasta ├── Homo_sapiens.protein.xml.gz └── Homo_sapiens.protein.xml.gz.log

2 directories, 16 files

acesnik commented 3 years ago

Thanks for sharing the error message and directory structure!
Could you share the data/ensembl/Homo_sapiens.ensembl.vcf.idx.log file?
Could you make sure that the spritz environment is up to date by running conda env update --name spritz --file environment.yaml again? I'm wondering if GATK needs to be updated from the older environment

Jokendo-collab commented 3 years ago

@acesnik after updating the conda environment it the quality control and then failed see the new error. I hope we are soon getting it to work.

Building DAG of jobs... Using shell: /bin/bash Provided cores: 16 Rules claiming more threads will be scaled down. Provided resources: mem_mb=50000 Conda environments: ignored Job counts: count jobs 1 LongOrfs 1 Predict 1 all 1 assemble_transcripts_fq_se 1 base_recalibration 1 blastp 1 call_gvcf_varaints 1 call_vcf_variants 1 cdna_alignment_orf_to_genome_orf 1 copy_gff3_to_snpeff 1 custom_protein_xml 1 fastp_fq_se 1 final_vcf_naming 1 finish_isoform 1 finish_isoform_variants 1 finish_variants 1 generate_snpeff_database 1 gtf_file_to_cDNA_seqs 1 gtf_to_alignment_gff3 1 hisat2_align_bam_fq_se 1 hisat2_groupmark_bam 1 hisat2_merge_bams 1 merge_transcripts 1 reference_protein_xml 1 remove_exon_and_utr_information 1 split_n_cigar_reads 1 transfer_modifications_isoformvariant 1 transfer_modifications_variant 1 variant_annotation_custom 1 variant_annotation_ref 30

[Thu Nov 19 16:09:50 2020] rule reference_protein_xml: input: SnpEff/data/Homo_sapiens.GRCh38/doneHomo_sapiens.GRCh38.txt, SnpEff/snpEff.jar, data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll, data/uniprot/Homo_sapiens.protein.xml.gz output: /home/jokendo/Spritz/Spritz/sptemp/variants/doneHomo_sapiens.GRCh38.100.txt, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.xml, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.xml.gz, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.fasta, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.withdecoys.fasta, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.withmods.xml, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.withmods.xml.gz log: /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.spritz.log jobid: 28 benchmark: /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.spritz.benchmark wildcards: dir=/home/jokendo/Spritz/Spritz/sptemp resources: mem_mb=16000

[Thu Nov 19 16:09:50 2020] rule fastp_fq_se: input: /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1_1.fastq.gz output: /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim_1.fastq.gz, /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.html, /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.json log: /home/jokendo/Spritz/Spritz/sptemp/T016-B2BCGBAL.merged_1.fq_se.trim.log jobid: 20 wildcards: dir=/home/jokendo/Spritz/Spritz/sptemp, fq_se=T016-B2BCGBAL.merged_1 threads: 6

[Thu Nov 19 16:11:16 2020] Error in rule reference_protein_xml: jobid: 28 output: /home/jokendo/Spritz/Spritz/sptemp/variants/doneHomo_sapiens.GRCh38.100.txt, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.xml, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.xml.gz, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.fasta, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.withdecoys.fasta, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.withmods.xml, /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.withmods.xml.gz log: /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.spritz.log (check log file(s) for error message) shell: (java -Xmx16000M -jar SnpEff/snpEff.jar -v -nostats -xmlProt /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.xml Homo_sapiens.GRCh38 && dotnet TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll -x data/uniprot/Homo_sapiens.protein.xml.gz -y /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.xml && gzip -k /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.withmods.xml /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.xml) &> /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.spritz.log && touch /home/jokendo/Spritz/Spritz/sptemp/variants/doneHomo_sapiens.GRCh38.100.txt (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job reference_protein_xml since they might be corrupted: /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.protein.xml [Thu Nov 19 16:11:36 2020] Finished job 20. 1 of 30 steps (3%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/jokendo/Spritz/Spritz/.snakemake/log/2020-11-19T160950.549469.snakemake.log

acesnik commented 3 years ago

It looks like fastp worked, so that's good to see at least.
Could you please send me /home/jokendo/Spritz/Spritz/sptemp/variants/Homo_sapiens.GRCh38.100.spritz.log if that's still around?

acesnik commented 3 years ago

It also looks like the GATK indexing worked this time, which is good. Did the GATK not found error go away?

Jokendo-collab commented 3 years ago

The GATK not found error went away. Here are the log files for your perusal

combined.sorted.grouped.marked.log Homo_sapiens.GRCh38.100.spritz.log

acesnik commented 3 years ago

Thank you for sharing those log files. I'm not sure what's going on there. I haven't seen that error in the database generation code in my GUI or commandline runs before. I'll try to check out what's going wrong later this week. Thanks for your patience on this.

mwfoster commented 3 years ago

I'm trying to analyze the recommended test SRA from the GUI and it seems like I may be getting a similar error. Thanks, Matt

Using default tag: latest latest: Pulling from smithlab/spritz Digest: sha256:e7357d5d19f619731128e99787aaffe839032ade0947a8ae4f12ad9632d4f1c4 Status: Image is up to date for smithlab/spritz:latest docker.io/smithlab/spritz:latest Building DAG of jobs... Using shell: /bin/bash Provided cores: 36 Rules claiming more threads will be scaled down. Conda environments: ignored Job counts: count jobs 1 all 1 base_recalibration 1 build_transfer_mods 1 call_gvcf_varaints 1 call_vcf_variants 1 dict_fa 1 download_ensembl_references 1 download_snpeff 1 fastp_sra 1 final_vcf_naming 1 finish_variants 1 generate_reference_snpeff_database 1 hisat2_align_bam_sra 1 hisat2_groupmark_bam 1 hisat2_merge_bams 1 hisat2_splice_sites 1 hisat_genome 1 index_fa 1 reference_protein_xml 1 reorder_genome_fasta 1 split_n_cigar_reads 1 tmpdir 1 transfer_modifications_variant 1 variant_annotation_ref 24 [Thu Dec 24 16:14:35 2020] rule tmpdir: output: tmp, temporary log: data/tmpdir.log jobid: 23 [Thu Dec 24 16:14:35 2020] rule download_snpeff: output: SnpEff/snpEff.config, SnpEff/snpEff.jar, SnpEff_4.3_SmithChemWisc_v2.zip log: data/SnpEffInstall.log jobid: 5 [Thu Dec 24 16:14:35 2020] rule build_transfer_mods: output: TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll log: data/TransferUniProtModifications.build.log jobid: 27 Removing temporary output file temporary. [Thu Dec 24 16:14:35 2020] Finished job 23. 1 of 24 steps (4%) done [Thu Dec 24 16:14:35 2020] rule download_ensembl_references: output: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa, data/ensembl/Homo_sapiens.GRCh38.86.gff3, data/ensembl/Homo_sapiens.GRCh38.pep.all.fa log: data/ensembl/downloads.log jobid: 6 benchmark: data/ensembl/downloads.benchmark [Thu Dec 24 16:14:35 2020] rule fastp_sra: input: analysis/SRR629563_1.fastq, analysis/SRR629563_2.fastq output: analysis/SRR629563.trim_1.fastq.gz, analysis/SRR629563.trim_2.fastq.gz, analysis/SRR629563.trim.html, analysis/SRR629563.trim.json log: analysis/SRR629563.trim.log jobid: 20 wildcards: dir=analysis, sra=SRR629563 threads: 6 [Thu Dec 24 16:14:40 2020] Finished job 27. 2 of 24 steps (8%) done Removing temporary output file SnpEff_4.3_SmithChemWisc_v2.zip. [Thu Dec 24 16:15:14 2020] Finished job 5. 3 of 24 steps (12%) done [Thu Dec 24 16:16:32 2020] Finished job 20. 4 of 24 steps (17%) done [Thu Dec 24 16:38:14 2020] Finished job 6. 5 of 24 steps (21%) done [Thu Dec 24 16:38:14 2020] rule reorder_genome_fasta: input: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa output: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa log: data/ensembl/karyotypic_order.log jobid: 7 benchmark: data/ensembl/karyotypic_order.benchmark [Thu Dec 24 16:38:14 2020] rule hisat2_splice_sites: input: data/ensembl/Homo_sapiens.GRCh38.86.gff3 output: data/ensembl/Homo_sapiens.GRCh38.86.splicesites.txt log: data/ensembl/Homo_sapiens.GRCh38.86.splicesites.log jobid: 22 [Thu Dec 24 16:38:32 2020] Finished job 22. 6 of 24 steps (25%) done [Thu Dec 24 16:40:09 2020] Finished job 7. 7 of 24 steps (29%) done [Thu Dec 24 16:40:09 2020] rule dict_fa: input: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa output: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.dict log: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.dict.log jobid: 25 [Thu Dec 24 16:40:09 2020] rule generate_reference_snpeff_database: input: SnpEff/snpEff.jar, data/ensembl/Homo_sapiens.GRCh38.86.gff3, data/ensembl/Homo_sapiens.GRCh38.pep.all.fa, data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa output: SnpEff/data/Homo_sapiens.GRCh38/protein.fa, SnpEff/data/Homo_sapiens.GRCh38/genes.gff, SnpEff/data/genomes/Homo_sapiens.GRCh38.fa, SnpEff/data/Homo_sapiens.GRCh38/doneHomo_sapiens.GRCh38.txt log: SnpEff/data/Homo_sapiens.GRCh38/snpeffdatabase.log jobid: 4 benchmark: SnpEff/data/Homo_sapiens.GRCh38/snpeffdatabase.benchmark resources: mem_mb=16000 [Thu Dec 24 16:40:09 2020] rule index_fa: input: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa output: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa.fai log: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa.log jobid: 24 [Thu Dec 24 16:40:09 2020] rule hisat_genome: input: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, data/ensembl/Homo_sapiens.GRCh38.86.gff3 output: data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.1.ht2, data/ensembl/done_building_hisat_genomeHomo_sapiens.GRCh38.txt log: data/ensembl/Homo_sapiens.GRCh38.hisatbuild.log jobid: 19 benchmark: data/ensembl/Homo_sapiens.GRCh38.hisatbuild.benchmark threads: 12 Tool returned: 0 [Thu Dec 24 16:40:42 2020] Finished job 25. 8 of 24 steps (33%) done [Thu Dec 24 16:40:45 2020] Finished job 24. 9 of 24 steps (38%) done [Thu Dec 24 16:42:53 2020] Finished job 4. 10 of 24 steps (42%) done [Thu Dec 24 16:42:53 2020] rule reference_protein_xml: input: SnpEff/data/Homo_sapiens.GRCh38/doneHomo_sapiens.GRCh38.txt, SnpEff/snpEff.jar, data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll, data/uniprot/Homo_sapiens.protein.xml.gz output: analysis/variants/doneHomo_sapiens.GRCh38.86.txt, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml.gz, analysis/variants/Homo_sapiens.GRCh38.86.protein.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withdecoys.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml.gz log: analysis/variants/Homo_sapiens.GRCh38.86.spritz.log jobid: 29 benchmark: analysis/variants/Homo_sapiens.GRCh38.86.spritz.benchmark wildcards: dir=analysis resources: mem_mb=16000 [Thu Dec 24 16:44:32 2020] Error in rule reference_protein_xml: jobid: 29 output: analysis/variants/doneHomo_sapiens.GRCh38.86.txt, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml.gz, analysis/variants/Homo_sapiens.GRCh38.86.protein.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withdecoys.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml.gz log: analysis/variants/Homo_sapiens.GRCh38.86.spritz.log (check log file(s) for error message) shell: (java -Xmx16000M -jar SnpEff/snpEff.jar -v -nostats -xmlProt analysis/variants/Homo_sapiens.GRCh38.86.protein.xml Homo_sapiens.GRCh38 && dotnet TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll -x data/uniprot/Homo_sapiens.protein.xml.gz -y analysis/variants/Homo_sapiens.GRCh38.86.protein.xml && gzip -k analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml analysis/variants/Homo_sapiens.GRCh38.86.protein.xml) &> analysis/variants/Homo_sapiens.GRCh38.86.spritz.log && touch analysis/variants/doneHomo_sapiens.GRCh38.86.txt (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) Removing output files of failed job reference_protein_xml since they might be corrupted: analysis/variants/Homo_sapiens.GRCh38.86.protein.xml Trying to restart job 29. [Thu Dec 24 16:46:02 2020] rule reference_protein_xml: input: SnpEff/data/Homo_sapiens.GRCh38/doneHomo_sapiens.GRCh38.txt, SnpEff/snpEff.jar, data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll, data/uniprot/Homo_sapiens.protein.xml.gz output: analysis/variants/doneHomo_sapiens.GRCh38.86.txt, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml.gz, analysis/variants/Homo_sapiens.GRCh38.86.protein.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withdecoys.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml.gz log: analysis/variants/Homo_sapiens.GRCh38.86.spritz.log jobid: 29 benchmark: analysis/variants/Homo_sapiens.GRCh38.86.spritz.benchmark wildcards: dir=analysis resources: mem_mb=16000 [Thu Dec 24 16:47:43 2020] Error in rule reference_protein_xml: jobid: 29 output: analysis/variants/doneHomo_sapiens.GRCh38.86.txt, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml.gz, analysis/variants/Homo_sapiens.GRCh38.86.protein.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withdecoys.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml.gz log: analysis/variants/Homo_sapiens.GRCh38.86.spritz.log (check log file(s) for error message) shell: (java -Xmx16000M -jar SnpEff/snpEff.jar -v -nostats -xmlProt analysis/variants/Homo_sapiens.GRCh38.86.protein.xml Homo_sapiens.GRCh38 && dotnet TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll -x data/uniprot/Homo_sapiens.protein.xml.gz -y analysis/variants/Homo_sapiens.GRCh38.86.protein.xml && gzip -k analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml analysis/variants/Homo_sapiens.GRCh38.86.protein.xml) &> analysis/variants/Homo_sapiens.GRCh38.86.spritz.log && touch analysis/variants/doneHomo_sapiens.GRCh38.86.txt (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) Removing output files of failed job reference_protein_xml since they might be corrupted: analysis/variants/Homo_sapiens.GRCh38.86.protein.xml Trying to restart job 29. [Thu Dec 24 16:49:13 2020] rule reference_protein_xml: input: SnpEff/data/Homo_sapiens.GRCh38/doneHomo_sapiens.GRCh38.txt, SnpEff/snpEff.jar, data/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll, data/uniprot/Homo_sapiens.protein.xml.gz output: analysis/variants/doneHomo_sapiens.GRCh38.86.txt, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml.gz, analysis/variants/Homo_sapiens.GRCh38.86.protein.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withdecoys.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml.gz log: analysis/variants/Homo_sapiens.GRCh38.86.spritz.log jobid: 29 benchmark: analysis/variants/Homo_sapiens.GRCh38.86.spritz.benchmark wildcards: dir=analysis resources: mem_mb=16000 [Thu Dec 24 16:50:52 2020] Error in rule reference_protein_xml: jobid: 29 output: analysis/variants/doneHomo_sapiens.GRCh38.86.txt, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.xml.gz, analysis/variants/Homo_sapiens.GRCh38.86.protein.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withdecoys.fasta, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml, analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml.gz log: analysis/variants/Homo_sapiens.GRCh38.86.spritz.log (check log file(s) for error message) shell: (java -Xmx16000M -jar SnpEff/snpEff.jar -v -nostats -xmlProt analysis/variants/Homo_sapiens.GRCh38.86.protein.xml Homo_sapiens.GRCh38 && dotnet TransferUniProtModifications/TransferUniProtModifications/bin/Release/netcoreapp3.1/TransferUniProtModifications.dll -x data/uniprot/Homo_sapiens.protein.xml.gz -y analysis/variants/Homo_sapiens.GRCh38.86.protein.xml && gzip -k analysis/variants/Homo_sapiens.GRCh38.86.protein.withmods.xml analysis/variants/Homo_sapiens.GRCh38.86.protein.xml) &> analysis/variants/Homo_sapiens.GRCh38.86.spritz.log && touch analysis/variants/doneHomo_sapiens.GRCh38.86.txt (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) Removing output files of failed job reference_protein_xml since they might be corrupted: analysis/variants/Homo_sapiens.GRCh38.86.protein.xml [Thu Dec 24 17:05:45 2020] Finished job 19. 11 of 24 steps (46%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /app/.snakemake/log/2020-12-24T161435.269162.snakemake.log

acesnik commented 3 years ago

@mwfoster @javanOkendo Sorry for the delay on this one. This is still my top priority for this project, and I'm hoping to get to it by the end of the month.

Jokendo-collab commented 3 years ago

Hi,

Thanks for the good gesture. I hope you will let me know once the software is ready.

On Fri, 15 Jan 2021, 21:13 Anthony, notifications@github.com wrote:

@mwfoster https://github.com/mwfoster @javanOkendo https://github.com/javanOkendo Sorry for the delay on this one. This is still my top priority for this project, and I'm hoping to get to it by the end of the month.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/smith-chem-wisc/Spritz/issues/195#issuecomment-761100171, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJ34OYMJU57TCQMJ4676ZTS2CATJANCNFSM4TR3VQZQ .

mwfoster commented 3 years ago

I did get back to trying this again with the U2OS file. It ran, but took ~ 1d on a high-end desktop. Regards, Matt

From: Javan Okendo notifications@github.com Sent: Friday, January 15, 2021 1:34 PM To: smith-chem-wisc/Spritz Spritz@noreply.github.com Cc: Matthew Foster, Ph.D. matthew.foster@duke.edu; Mention mention@noreply.github.com Subject: Re: [smith-chem-wisc/Spritz] How to run Spritz on HPC environment and how to analyze single End RNAseqs (#195)

Hi,

Thanks for the good gesture. I hope you will let me know once the software is ready.

On Fri, 15 Jan 2021, 21:13 Anthony, notifications@github.com<mailto:notifications@github.com> wrote:

@mwfoster https://github.com/mwfoster https://urldefense.com/v3/__https:/github.com/mwfoster*3E__;JQ!!OToaGQ!5ruDOIsG6aLas51J7SJXFRUIG6AuikLOfl-7rLlP9vo1OrSmI1lgCrMrh0LoGlmb$ @javanOkendo https://github.com/javanOkendo https://urldefense.com/v3/__https:/github.com/javanOkendo*3E__;JQ!!OToaGQ!5ruDOIsG6aLas51J7SJXFRUIG6AuikLOfl-7rLlP9vo1OrSmI1lgCrMrh_yvHOew$ Sorry for the delay on this one. This is still my top priority for this project, and I'm hoping to get to it by the end of the month.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/smith-chem-wisc/Spritz/issues/195#issuecomment-761100171 https://urldefense.com/v3/__https:/github.com/smith-chem-wisc/Spritz/issues/195*issuecomment-761100171*3E__;IyU!!OToaGQ!5ruDOIsG6aLas51J7SJXFRUIG6AuikLOfl-7rLlP9vo1OrSmI1lgCrMrhxDMTj_a$, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJ34OYMJU57TCQMJ4676ZTS2CATJANCNFSM4TR3VQZQ https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AGJ34OYMJU57TCQMJ4676ZTS2CATJANCNFSM4TR3VQZQ*3E__;JQ!!OToaGQ!5ruDOIsG6aLas51J7SJXFRUIG6AuikLOfl-7rLlP9vo1OrSmI1lgCrMrh3OTK09J$ .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/smith-chem-wisc/Spritz/issues/195*issuecomment-761111531__;Iw!!OToaGQ!5ruDOIsG6aLas51J7SJXFRUIG6AuikLOfl-7rLlP9vo1OrSmI1lgCrMrh2euQABu$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/ADXFNPXLSMCAVLQ2VMESP6DS2CDDLANCNFSM4TR3VQZQ__;!!OToaGQ!5ruDOIsG6aLas51J7SJXFRUIG6AuikLOfl-7rLlP9vo1OrSmI1lgCrMrh7VK-m7S$.

acesnik commented 3 years ago

@javanOkendo I was able to replicate the most recent error you reported on this thread on a protected access server, and I made a fix in https://github.com/smith-chem-wisc/Spritz/pull/207. It should be ready to try again.

acesnik commented 3 years ago

The issue was there was one remaining download (of the ptmlist from UniProt) that had to be taken care of online before uploading the tarballed Spritz. That should be taken care of in the setup before bundling and uploading Spritz.

acesnik commented 3 years ago

I'm going to close this issue for now given the fix noted above. Please feel free to reopen it if you have any other issues.

smith-chem-wisc / Spritz

How to run Spritz on HPC environment and how to analyze single End RNAseqs #195