smith-chem-wisc / Spritz

Software for RNA-Seq analysis to create sample-specific proteoform databases from RNA-Seq data
https://smith-chem-wisc.github.io/Spritz/
MIT License
7 stars 11 forks source link

Error in rule make_gene_quant_dataframe_ref #237

Open animesh opened 10 months ago

animesh commented 10 months ago

I am trying to use the beta functions

image

but facing this error (log below), any ideas how to proceed?

Command executing: Powershell.exe docker pull smithlab/spritz:0.3.10;docker run --rm -i -t --user=root --name spritz1364950398 -v """F:\TK\seqRNA:/app/spritz/results/""" -v """F:\TK\resources:/app/spritz/resources""" smithlab/spritz:0.3.10 conda run --no-capture-output --live-stream dotnet SpritzCMD.dll --threads 6 --analysisDirectory=/app/spritz/results/ --reference="""release-97,homo_sapiens,human,GRCh38""" --analyzeVariants --analyzeIsoforms --doQuantification --fastq1=TK12_R1,TK12_R2,TK12_R3 --fastq2=TK12_R1,TK12_R2,TK12_R3 ; docker stop spritz1364950398
Saving output to F:\TK\seqRNA\workflow_2023-08-31-10-07-13.txt. Please monitor it there...

ca5806d5421b: Already exists
0d9226469454: Already exists
docker.io/smithlab/spritz:0.3.10
What's Next?
  View summary of image vulnerabilities and recommendations → docker scout quickview smithlab/spritz:0.3.10
[?1h=Welcome to Spritz!
Testing analysis directory /app/spritz/results/
Using analysis directory /app/spritz/results/
Running `snakemake -j 6 --use-conda --conda-frontend mamba --configfile /app/spritz/results/config/config.yaml`.
Building DAG of jobs...
Creating conda environment envs/proteogenomics.yaml...
Downloading and installing remote packages.
Environment for envs/proteogenomics.yaml created (location: .snakemake/conda/47a7eecc)
Creating conda environment envs/quant.yaml...
Downloading and installing remote packages.
Environment for envs/quant.yaml created (location: .snakemake/conda/a2cf4925)
Creating conda environment envs/downloads.yaml...
Downloading and installing remote packages.
Environment for envs/downloads.yaml created (location: .snakemake/conda/5fe6c4ba)
Creating conda environment envs/variants.yaml...
Downloading and installing remote packages.
Environment for envs/variants.yaml created (location: .snakemake/conda/180357af)
Creating conda environment envs/isoforms.yaml...
Downloading and installing remote packages.
Environment for envs/isoforms.yaml created (location: .snakemake/conda/45a9cd02)
Creating conda environment envs/spritzbase.yaml...
Downloading and installing remote packages.
Environment for envs/spritzbase.yaml created (location: .snakemake/conda/ce565c96)
Creating conda environment envs/align.yaml...
Downloading and installing remote packages.
Environment for envs/align.yaml created (location: .snakemake/conda/e361d902)
Creating conda environment envs/default.yaml...
Downloading and installing remote packages.
Environment for envs/default.yaml created (location: .snakemake/conda/46c8395d)
Using shell: /bin/bash
Provided cores: 6
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   LongOrfs
    1   Predict
    1   all
    3   assemble_transcripts_fq
    1   base_recalibration
    1   blastp
    1   call_gvcf_varaints
    1   call_vcf_variants
    1   cdna_alignment_orf_to_genome_orf
    1   copy_gff3_to_snpeff
    1   custom_protein_xml
    1   dict_fa
    1   download_chromosome_mappings
    1   download_dbsnp_vcf
    1   download_ensembl_references
    1   download_protein_xml
    1   download_snpeff
    3   fastp_fq_uncompressed
    1   final_vcf_naming
    1   finish_isoform
    1   finish_isoform_variants
    1   finish_variants
    1   generate_reference_snpeff_database
    1   generate_snpeff_database
    1   gtf_file_to_cDNA_seqs
    1   gtf_to_alignment_gff3
    3   hisat2_align_bam_fq
    1   hisat2_group
    1   hisat2_mark
    1   hisat2_merge_bams
    1   hisat2_splice_sites
    1   hisat_genome
    1   index_ensembl_vcf
    1   index_fa
    1   make_gene_quant_dataframe_custom
    1   make_gene_quant_dataframe_ref
    1   make_isoform_quant_dataframe_custom
    1   make_isoform_quant_dataframe_ref
    1   makeblastdb
    1   merge_transcripts
    1   prose
    3   quantify_ref_transcripts_fq
    3   quantify_transcripts_fq
    1   reference_protein_xml
    1   remove_exon_and_utr_information
    1   reorder_genome_fasta
    1   setup_ptmlist_links
    1   setup_transfer_mods
    1   split_n_cigar_reads
    1   transfer_modifications_isoformvariant
    1   transfer_modifications_variant
    1   variant_annotation_custom
    1   variant_annotation_ref
    1   variant_tmpdir
    64

hu Aug 31 08:12:44 2023]
rule download_ensembl_references:
    output: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa, ../resources/ensembl/Homo_sapiens.GRCh38.97.gff3, ../resources/ensembl/Homo_sapiens.GRCh38.pep.all.fa
    log: ../resources/ensembl/downloads.log
    jobid: 7
    benchmark: ../resources/ensembl/downloads.benchmark


hu Aug 31 08:12:44 2023]
rule setup_transfer_mods:
    input: ../SpritzModifications.dll
    output: ../resources/ptmlist.txt, ../resources/PSI-MOD.obo.xml
    log: ../resources/setup_transfer_mods.log
    jobid: 4
    benchmark: ../resources/setup_transfer_mods.benchmark


hu Aug 31 08:12:44 2023]
rule variant_tmpdir:
    output: ../resources/tmp
    log: ../resources/tmpdir.log
    jobid: 31


hu Aug 31 08:12:44 2023]
rule download_protein_xml:
    output: ../resources/uniprot/Homo_sapiens.protein.xml.gz, ../resources/uniprot/Homo_sapiens.protein.fasta
    log: ../resources/uniprot/Homo_sapiens.protein.xml.gz.log
    jobid: 9
    benchmark: ../resources/uniprot/Homo_sapiens.protein.xml.gz.benchmark


hu Aug 31 08:12:44 2023]
rule download_snpeff:
    output: ../resources/SnpEff/snpEff.config, ../resources/SnpEff/snpEff.jar, ../resources/SnpEff_4.3_SmithChemWisc_v2.zip
    log: ../resources/SnpEffInstall.log
    jobid: 6


hu Aug 31 08:12:44 2023]
rule prose:
    output: ../results/prose.txt
    log: ../results/prose.log
    jobid: 1
    wildcards: dir=../results

Activating conda environment: /app/spritz/workflow/.snakemake/conda/180357af
Activating conda environment: /app/spritz/workflow/.snakemake/conda/ce565c96
Activating conda environment: /app/spritz/workflow/.snakemake/conda/5fe6c4ba
Activating conda environment: /app/spritz/workflow/.snakemake/conda/5fe6c4ba
Activating conda environment: /app/spritz/workflow/.snakemake/conda/46c8395d
Activating conda environment: /app/spritz/workflow/.snakemake/conda/5fe6c4ba
hu Aug 31 08:12:45 2023]
Finished job 31.
1 of 64 steps (2%) done
hu Aug 31 08:12:45 2023]
Finished job 1.
2 of 64 steps (3%) done
hu Aug 31 08:12:47 2023]
Finished job 9.
3 of 64 steps (5%) done

hu Aug 31 08:12:47 2023]
rule download_chromosome_mappings:
    output: ../resources/ChromosomeMappings/GRCh38_UCSC2ensembl.txt
    log: ../resources/download_chromosome_mappings.log
    jobid: 16
    benchmark: ../resources/download_chromosome_mappings.benchmark


hu Aug 31 08:12:47 2023]
rule makeblastdb:
    input: ../resources/uniprot/Homo_sapiens.protein.fasta
    output: ../resources/uniprot/Homo_sapiens.protein.fasta.pin, ../resources/uniprot/Homo_sapiens.protein.fasta.phr, ../resources/uniprot/Homo_sapiens.protein.fasta.psq
    log: ../resources/uniprot/Homo_sapiens.proteinmakeblastdb.log
    jobid: 49
    benchmark: ../resources/uniprot/Homo_sapiens.proteinmakeblastdb.benchmark

Activating conda environment: /app/spritz/workflow/.snakemake/conda/5fe6c4ba
Activating conda environment: /app/spritz/workflow/.snakemake/conda/45a9cd02
hu Aug 31 08:12:50 2023]
Finished job 4.
4 of 64 steps (6%) done

hu Aug 31 08:12:50 2023]
rule setup_ptmlist_links:
    input: ../resources/ptmlist.txt, ../resources/PSI-MOD.obo.xml
    output: ptmlist.txt, PSI-MOD.obo.xml
    log: ../resources/setup_transfer_mod_linking.log
    jobid: 3
    benchmark: ../resources/setup_transfer_mod_linking.benchmark

Building a new DB, current time: 08/31/2023 08:12:51
New DB name:   /app/spritz/resources/uniprot/Homo_sapiens.protein.fasta
New DB title:  ../resources/uniprot/Homo_sapiens.protein.fasta
Sequence type: Protein
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 25 sequences in 0.0666111 seconds.
Activating conda environment: /app/spritz/workflow/.snakemake/conda/47a7eecc
hu Aug 31 08:12:51 2023]
Finished job 49.
5 of 64 steps (8%) done
hu Aug 31 08:12:51 2023]
Finished job 3.
6 of 64 steps (9%) done
hu Aug 31 08:12:54 2023]
Finished job 16.
7 of 64 steps (11%) done

hu Aug 31 08:12:54 2023]
rule download_dbsnp_vcf:
    input: ../resources/ChromosomeMappings/GRCh38_UCSC2ensembl.txt
    output: ../resources/ensembl/Homo_sapiens.ensembl.vcf
    log: ../resources/ensembl/downloads_dbsnp_vcf.log
    jobid: 15
    benchmark: ../resources/ensembl/downloads_dbsnp_vcf.benchmark

Activating conda environment: /app/spritz/workflow/.snakemake/conda/5fe6c4ba
Removing temporary output file ../resources/SnpEff_4.3_SmithChemWisc_v2.zip.
hu Aug 31 08:17:41 2023]
Finished job 6.
8 of 64 steps (12%) done
hu Aug 31 08:24:33 2023]
Finished job 7.
9 of 64 steps (14%) done

hu Aug 31 08:24:33 2023]
rule hisat2_splice_sites:
    input: ../resources/ensembl/Homo_sapiens.GRCh38.97.gff3
    output: ../resources/ensembl/Homo_sapiens.GRCh38.97.splicesites.txt
    log: ../resources/ensembl/Homo_sapiens.GRCh38.97.splicesites.log
    jobid: 26


hu Aug 31 08:24:33 2023]
rule reorder_genome_fasta:
    input: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.fa
    output: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa
    log: ../resources/ensembl/karyotypic_order.log
    jobid: 8
    benchmark: ../resources/ensembl/karyotypic_order.benchmark

Activating conda environment: /app/spritz/workflow/.snakemake/conda/e361d902
Activating conda environment: /app/spritz/workflow/.snakemake/conda/5fe6c4ba
hu Aug 31 08:24:44 2023]
Finished job 26.
10 of 64 steps (16%) done
hu Aug 31 08:26:12 2023]
Finished job 8.
11 of 64 steps (17%) done

hu Aug 31 08:26:12 2023]
rule dict_fa:
    input: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa
    output: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.dict
    log: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.dict.log
    jobid: 33
    benchmark: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.dict.benchmark


hu Aug 31 08:26:12 2023]
rule generate_reference_snpeff_database:
    input: ../resources/SnpEff/snpEff.jar, ../resources/ensembl/Homo_sapiens.GRCh38.97.gff3, ../resources/ensembl/Homo_sapiens.GRCh38.pep.all.fa, ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa
    output: ../resources/SnpEff/data/Homo_sapiens.GRCh38/protein.fa, ../resources/SnpEff/data/Homo_sapiens.GRCh38/genes.gff, ../resources/SnpEff/data/genomes/Homo_sapiens.GRCh38.fa, ../resources/SnpEff/data/Homo_sapiens.GRCh38/doneHomo_sapiens.GRCh38.txt
    log: ../resources/SnpEff/data/Homo_sapiens.GRCh38/snpeffdatabase.log
    jobid: 5
    benchmark: ../resources/SnpEff/data/Homo_sapiens.GRCh38/snpeffdatabase.benchmark
    resources: mem_mb=16000


hu Aug 31 08:26:12 2023]
rule index_fa:
    input: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa
    output: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa.fai
    log: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa.faindex.log
    jobid: 32

Activating conda environment: /app/spritz/workflow/.snakemake/conda/180357af
Activating conda environment: /app/spritz/workflow/.snakemake/conda/47a7eecc
Activating conda environment: /app/spritz/workflow/.snakemake/conda/180357af
hu Aug 31 08:26:51 2023]
Finished job 32.
12 of 64 steps (19%) done
Tool returned:
0
hu Aug 31 08:26:52 2023]
Finished job 33.
13 of 64 steps (20%) done
hu Aug 31 08:38:33 2023]
Finished job 5.
14 of 64 steps (22%) done

hu Aug 31 08:38:33 2023]
rule reference_protein_xml:
    input: ptmlist.txt, PSI-MOD.obo.xml, ../resources/SnpEff/data/Homo_sapiens.GRCh38/doneHomo_sapiens.GRCh38.txt, ../resources/SnpEff/snpEff.jar, ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, ../SpritzModifications.dll, ../resources/uniprot/Homo_sapiens.protein.xml.gz
    output: ../results/variants/doneHomo_sapiens.GRCh38.97.txt, ../results/variants/Homo_sapiens.GRCh38.97.protein.xml, ../results/variants/Homo_sapiens.GRCh38.97.protein.xml.gz, ../results/variants/Homo_sapiens.GRCh38.97.protein.fasta, ../results/variants/Homo_sapiens.GRCh38.97.protein.withdecoys.fasta, ../results/variants/Homo_sapiens.GRCh38.97.protein.withmods.xml, ../results/variants/Homo_sapiens.GRCh38.97.protein.withmods.xml.gz
    log: ../results/variants/Homo_sapiens.GRCh38.97.spritz.log
    jobid: 2
    benchmark: ../results/variants/Homo_sapiens.GRCh38.97.spritz.benchmark
    wildcards: dir=../results
    resources: mem_mb=16000

Activating conda environment: /app/spritz/workflow/.snakemake/conda/47a7eecc
Removing temporary output file ../results/variants/Homo_sapiens.GRCh38.97.protein.xml.
Removing temporary output file ../results/variants/Homo_sapiens.GRCh38.97.protein.withmods.xml.
hu Aug 31 08:42:09 2023]
Finished job 2.
15 of 64 steps (23%) done
hu Aug 31 08:54:52 2023]
Finished job 15.
16 of 64 steps (25%) done

hu Aug 31 08:54:52 2023]
rule fastp_fq_uncompressed:
    input: ../results/TK12_R3_1.fastq, ../results/TK12_R3_2.fastq
    output: ../results/TK12_R3.fq.trim_1.fastq.gz, ../results/TK12_R3.fq.trim_2.fastq.gz, ../results/TK12_R3.fq.trim.html, ../results/TK12_R3.fq.trim.json
    log: ../results/TK12_R3.fq.trim.log
    jobid: 30
    wildcards: dir=../results, fq=TK12_R3
    threads: 6

Activating conda environment: /app/spritz/workflow/.snakemake/conda/e361d902
hu Aug 31 09:09:38 2023]
Finished job 30.
17 of 64 steps (27%) done

hu Aug 31 09:09:38 2023]
rule fastp_fq_uncompressed:
    input: ../results/TK12_R2_1.fastq, ../results/TK12_R2_2.fastq
    output: ../results/TK12_R2.fq.trim_1.fastq.gz, ../results/TK12_R2.fq.trim_2.fastq.gz, ../results/TK12_R2.fq.trim.html, ../results/TK12_R2.fq.trim.json
    log: ../results/TK12_R2.fq.trim.log
    jobid: 28
    wildcards: dir=../results, fq=TK12_R2
    threads: 6

Activating conda environment: /app/spritz/workflow/.snakemake/conda/e361d902
hu Aug 31 09:25:02 2023]
Finished job 28.
18 of 64 steps (28%) done

hu Aug 31 09:25:02 2023]
rule hisat_genome:
    input: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, ../resources/ensembl/Homo_sapiens.GRCh38.97.gff3
    output: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.1.ht2, ../resources/ensembl/done_building_hisat_genomeHomo_sapiens.GRCh38.txt
    log: ../resources/ensembl/Homo_sapiens.GRCh38.hisatbuild.log
    jobid: 24
    benchmark: ../resources/ensembl/Homo_sapiens.GRCh38.hisatbuild.benchmark
    threads: 6

Activating conda environment: /app/spritz/workflow/.snakemake/conda/e361d902
hu Aug 31 10:03:47 2023]
Finished job 24.
19 of 64 steps (30%) done

hu Aug 31 10:03:47 2023]
rule hisat2_align_bam_fq:
    input: ../resources/ensembl/done_building_hisat_genomeHomo_sapiens.GRCh38.txt, ../results/TK12_R3.fq.trim_1.fastq.gz, ../results/TK12_R3.fq.trim_2.fastq.gz, ../resources/ensembl/Homo_sapiens.GRCh38.97.splicesites.txt
    output: ../results/align/TK12_R3.fq.sorted.bam
    log: ../results/align/TK12_R3.fq.hisat2.log
    jobid: 29
    wildcards: dir=../results, fq=TK12_R3
    threads: 6

Activating conda environment: /app/spritz/workflow/.snakemake/conda/e361d902
hu Aug 31 10:34:28 2023]
Finished job 29.
20 of 64 steps (31%) done

hu Aug 31 10:34:28 2023]
rule fastp_fq_uncompressed:
    input: ../results/TK12_R1_1.fastq, ../results/TK12_R1_2.fastq
    output: ../results/TK12_R1.fq.trim_1.fastq.gz, ../results/TK12_R1.fq.trim_2.fastq.gz, ../results/TK12_R1.fq.trim.html, ../results/TK12_R1.fq.trim.json
    log: ../results/TK12_R1.fq.trim.log
    jobid: 25
    wildcards: dir=../results, fq=TK12_R1
    threads: 6

Activating conda environment: /app/spritz/workflow/.snakemake/conda/e361d902
hu Aug 31 10:50:24 2023]
Finished job 25.
21 of 64 steps (33%) done

hu Aug 31 10:50:24 2023]
rule hisat2_align_bam_fq:
    input: ../resources/ensembl/done_building_hisat_genomeHomo_sapiens.GRCh38.txt, ../results/TK12_R1.fq.trim_1.fastq.gz, ../results/TK12_R1.fq.trim_2.fastq.gz, ../resources/ensembl/Homo_sapiens.GRCh38.97.splicesites.txt
    output: ../results/align/TK12_R1.fq.sorted.bam
    log: ../results/align/TK12_R1.fq.hisat2.log
    jobid: 23
    wildcards: dir=../results, fq=TK12_R1
    threads: 6

Activating conda environment: /app/spritz/workflow/.snakemake/conda/e361d902
hu Aug 31 11:25:14 2023]
Finished job 23.
22 of 64 steps (34%) done

hu Aug 31 11:25:14 2023]
rule hisat2_align_bam_fq:
    input: ../resources/ensembl/done_building_hisat_genomeHomo_sapiens.GRCh38.txt, ../results/TK12_R2.fq.trim_1.fastq.gz, ../results/TK12_R2.fq.trim_2.fastq.gz, ../resources/ensembl/Homo_sapiens.GRCh38.97.splicesites.txt
    output: ../results/align/TK12_R2.fq.sorted.bam
    log: ../results/align/TK12_R2.fq.hisat2.log
    jobid: 27
    wildcards: dir=../results, fq=TK12_R2
    threads: 6

Activating conda environment: /app/spritz/workflow/.snakemake/conda/e361d902
hu Aug 31 11:55:17 2023]
Finished job 27.
23 of 64 steps (36%) done

hu Aug 31 11:55:17 2023]
rule hisat2_merge_bams:
    input: ../results/align/TK12_R1.fq.sorted.bam, ../results/align/TK12_R2.fq.sorted.bam, ../results/align/TK12_R3.fq.sorted.bam
    output: ../results/align/combined.sorted.bam, ../results/align/combined.sorted.stats
    log: ../results/align/combined.sorted.log
    jobid: 22
    wildcards: dir=../results
    threads: 6
    resources: mem_mb=16000

Activating conda environment: /app/spritz/workflow/.snakemake/conda/e361d902
hu Aug 31 12:19:09 2023]
Finished job 22.
24 of 64 steps (38%) done

hu Aug 31 12:19:09 2023]
rule index_ensembl_vcf:
    input: ../resources/ensembl/Homo_sapiens.ensembl.vcf
    output: ../resources/ensembl/Homo_sapiens.ensembl.vcf.idx
    log: ../resources/ensembl/Homo_sapiens.ensembl.vcf.idx.log
    jobid: 17
    benchmark: ../resources/ensembl/Homo_sapiens.ensembl.vcf.idx.benchmark


hu Aug 31 12:19:09 2023]
rule assemble_transcripts_fq:
    input: ../results/align/TK12_R3.fq.sorted.bam, ../resources/ensembl/Homo_sapiens.GRCh38.97.gff3
    output: ../results/isoforms/TK12_R3.fq.sorted.gtf, ../results/isoforms/TK12_R3.fq.sorted.gtf.gz
    log: ../results/isoforms/TK12_R3.fq.sorted.gtf.log
    jobid: 45
    benchmark: ../results/isoforms/TK12_R3.fq.sorted.gtf.benchmark
    wildcards: dir=../results, fq=TK12_R3
    threads: 4


hu Aug 31 12:19:09 2023]
rule hisat2_group:
    input: ../results/align/combined.sorted.bam, ../resources/tmp
    output: ../results/variants/combined.sorted.grouped.bam, ../results/variants/combined.sorted.grouped.bam.bai
    log: ../results/variants/combined.sorted.grouped.log
    jobid: 21
    benchmark: ../results/variants/combined.sorted.grouped.benchmark
    wildcards: dir=../results
    resources: mem_mb=24000

Activating conda environment: /app/spritz/workflow/.snakemake/conda/180357af
Activating conda environment: /app/spritz/workflow/.snakemake/conda/45a9cd02
Activating conda environment: /app/spritz/workflow/.snakemake/conda/180357af
hu Aug 31 12:23:28 2023]
Finished job 45.
25 of 64 steps (39%) done
Tool returned:
/app/spritz/workflow/../resources/ensembl/Homo_sapiens.ensembl.vcf.idx
hu Aug 31 12:25:06 2023]
Finished job 17.
26 of 64 steps (41%) done
Removing temporary output file ../results/variants/combined.sorted.grouped.bam.bai.
hu Aug 31 14:08:08 2023]
Finished job 21.
27 of 64 steps (42%) done

hu Aug 31 14:08:09 2023]
rule quantify_ref_transcripts_fq:
    input: ../results/align/TK12_R3.fq.sorted.bam, ../resources/ensembl/Homo_sapiens.GRCh38.97.gff3
    output: ../results/isoforms/TK12_R3_quant_ref/t_data.ctab, ../results/isoforms/TK12_R3_quant_ref/e_data.ctab, ../results/isoforms/TK12_R3_quant_ref/i_data.ctab, ../results/isoforms/TK12_R3_quant_ref/e2t.ctab, ../results/isoforms/TK12_R3_quant_ref/i2t.ctab, ../results/isoforms/TK12_R3_quant_ref/TK12_R3.fq.gene.quant_ref.tab, ../results/isoforms/TK12_R3_quant_ref/TK12_R3.fq.transcript.quant_ref.gtf, ../results/isoforms/TK12_R3_quant_ref/TK12_R3.fq.transcript.quant_ref.gtf.gz
    log: ../results/isoforms/TK12_R3_quant_ref/TK12_R3.fq.quant_ref.gtf.log
    jobid: 54
    benchmark: ../results/isoforms/TK12_R3_quant_ref/TK12_R3.fq.quant_ref.gtf.benchmark
    wildcards: dir=../results, fq=TK12_R3
    threads: 4


hu Aug 31 14:08:09 2023]
rule hisat2_mark:
    input: ../results/variants/combined.sorted.grouped.bam, ../resources/tmp
    output: ../results/variants/combined.sorted.grouped.marked.bam, ../results/variants/combined.sorted.grouped.marked.bam.bai, ../results/variants/combined.sorted.grouped.marked.metrics
    log: ../results/variants/combined.sorted.grouped.marked.log
    jobid: 20
    benchmark: ../results/variants/combined.sorted.grouped.marked.benchmark
    wildcards: dir=../results
    resources: mem_mb=24000

Activating conda environment: /app/spritz/workflow/.snakemake/conda/a2cf4925
Activating conda environment: /app/spritz/workflow/.snakemake/conda/180357af
hu Aug 31 14:11:41 2023]
Finished job 54.
28 of 64 steps (44%) done
Removing temporary output file ../results/variants/combined.sorted.grouped.bam.
hu Aug 31 16:07:35 2023]
Finished job 20.
29 of 64 steps (45%) done

hu Aug 31 16:07:35 2023]
rule quantify_ref_transcripts_fq:
    input: ../results/align/TK12_R1.fq.sorted.bam, ../resources/ensembl/Homo_sapiens.GRCh38.97.gff3
    output: ../results/isoforms/TK12_R1_quant_ref/t_data.ctab, ../results/isoforms/TK12_R1_quant_ref/e_data.ctab, ../results/isoforms/TK12_R1_quant_ref/i_data.ctab, ../results/isoforms/TK12_R1_quant_ref/e2t.ctab, ../results/isoforms/TK12_R1_quant_ref/i2t.ctab, ../results/isoforms/TK12_R1_quant_ref/TK12_R1.fq.gene.quant_ref.tab, ../results/isoforms/TK12_R1_quant_ref/TK12_R1.fq.transcript.quant_ref.gtf, ../results/isoforms/TK12_R1_quant_ref/TK12_R1.fq.transcript.quant_ref.gtf.gz
    log: ../results/isoforms/TK12_R1_quant_ref/TK12_R1.fq.quant_ref.gtf.log
    jobid: 52
    benchmark: ../results/isoforms/TK12_R1_quant_ref/TK12_R1.fq.quant_ref.gtf.benchmark
    wildcards: dir=../results, fq=TK12_R1
    threads: 4


hu Aug 31 16:07:35 2023]
rule split_n_cigar_reads:
    input: ../results/variants/combined.sorted.grouped.marked.bam, ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa.fai, ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.dict, ../resources/tmp
    output: ../results/variants/combined.fixedQuals.bam, ../results/variants/combined.sorted.grouped.marked.split.bam, ../results/variants/combined.sorted.grouped.marked.split.bam.bai
    log: ../results/variants/combined.sorted.grouped.marked.split.log
    jobid: 19
    benchmark: ../results/variants/combined.sorted.grouped.marked.split.benchmark
    wildcards: dir=../results
    resources: mem_mb=24000

Activating conda environment: /app/spritz/workflow/.snakemake/conda/180357af
Activating conda environment: /app/spritz/workflow/.snakemake/conda/a2cf4925
hu Aug 31 16:11:36 2023]
Finished job 52.
30 of 64 steps (47%) done
Removing temporary output file ../results/variants/combined.fixedQuals.bam.
Removing temporary output file ../results/variants/combined.sorted.grouped.marked.split.bam.bai.
ri Sep  1 00:12:51 2023]
Finished job 19.
31 of 64 steps (48%) done

ri Sep  1 00:12:51 2023]
rule assemble_transcripts_fq:
    input: ../results/align/TK12_R2.fq.sorted.bam, ../resources/ensembl/Homo_sapiens.GRCh38.97.gff3
    output: ../results/isoforms/TK12_R2.fq.sorted.gtf, ../results/isoforms/TK12_R2.fq.sorted.gtf.gz
    log: ../results/isoforms/TK12_R2.fq.sorted.gtf.log
    jobid: 44
    benchmark: ../results/isoforms/TK12_R2.fq.sorted.gtf.benchmark
    wildcards: dir=../results, fq=TK12_R2
    threads: 4


ri Sep  1 00:12:52 2023]
rule base_recalibration:
    input: ../resources/ensembl/Homo_sapiens.ensembl.vcf, ../resources/ensembl/Homo_sapiens.ensembl.vcf.idx, ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, ../results/variants/combined.sorted.grouped.marked.split.bam, ../resources/tmp
    output: ../results/variants/combined.sorted.grouped.marked.split.recaltable, ../results/variants/combined.sorted.grouped.marked.split.recal.bam
    log: ../results/variants/combined.sorted.grouped.marked.split.recal.log
    jobid: 18
    benchmark: ../results/variants/combined.sorted.grouped.marked.split.recal.benchmark
    wildcards: dir=../results
    resources: mem_mb=24000

Activating conda environment: /app/spritz/workflow/.snakemake/conda/180357af
Activating conda environment: /app/spritz/workflow/.snakemake/conda/45a9cd02
ri Sep  1 00:16:56 2023]
Finished job 44.
32 of 64 steps (50%) done
Removing temporary output file ../results/variants/combined.sorted.grouped.marked.split.bam.
Removing temporary output file ../results/variants/combined.sorted.grouped.marked.split.recaltable.
ri Sep  1 01:51:39 2023]
Finished job 18.
33 of 64 steps (52%) done

ri Sep  1 01:51:39 2023]
rule call_gvcf_varaints:
    input: ../resources/ensembl/Homo_sapiens.ensembl.vcf, ../resources/ensembl/Homo_sapiens.ensembl.vcf.idx, ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, ../results/variants/combined.sorted.grouped.marked.split.recal.bam, ../resources/tmp
    output: ../results/variants/combined.sorted.grouped.marked.split.recal.g.vcf.gz
    log: ../results/variants/combined.sorted.grouped.marked.split.recal.g.log
    jobid: 14
    benchmark: ../results/variants/combined.sorted.grouped.marked.split.recal.g.benchmark
    wildcards: dir=../results
    threads: 6
    resources: mem_mb=24000

Activating conda environment: /app/spritz/workflow/.snakemake/conda/180357af
Removing temporary output file ../results/variants/combined.sorted.grouped.marked.split.recal.bam.
ri Sep  1 06:24:38 2023]
Finished job 14.
34 of 64 steps (53%) done

ri Sep  1 06:24:38 2023]
rule call_vcf_variants:
    input: ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, ../results/variants/combined.sorted.grouped.marked.split.recal.g.vcf.gz, ../resources/tmp
    output: ../results/variants/combined.sorted.grouped.marked.split.recal.g.gt.vcf
    log: ../results/variants/combined.sorted.grouped.marked.split.recal.g.gt.log
    jobid: 13
    benchmark: ../results/variants/combined.sorted.grouped.marked.split.recal.g.gt.benchmark
    wildcards: dir=../results
    resources: mem_mb=24000


ri Sep  1 06:24:38 2023]
rule quantify_ref_transcripts_fq:
    input: ../results/align/TK12_R2.fq.sorted.bam, ../resources/ensembl/Homo_sapiens.GRCh38.97.gff3
    output: ../results/isoforms/TK12_R2_quant_ref/t_data.ctab, ../results/isoforms/TK12_R2_quant_ref/e_data.ctab, ../results/isoforms/TK12_R2_quant_ref/i_data.ctab, ../results/isoforms/TK12_R2_quant_ref/e2t.ctab, ../results/isoforms/TK12_R2_quant_ref/i2t.ctab, ../results/isoforms/TK12_R2_quant_ref/TK12_R2.fq.gene.quant_ref.tab, ../results/isoforms/TK12_R2_quant_ref/TK12_R2.fq.transcript.quant_ref.gtf, ../results/isoforms/TK12_R2_quant_ref/TK12_R2.fq.transcript.quant_ref.gtf.gz
    log: ../results/isoforms/TK12_R2_quant_ref/TK12_R2.fq.quant_ref.gtf.log
    jobid: 53
    benchmark: ../results/isoforms/TK12_R2_quant_ref/TK12_R2.fq.quant_ref.gtf.benchmark
    wildcards: dir=../results, fq=TK12_R2
    threads: 4

Activating conda environment: /app/spritz/workflow/.snakemake/conda/180357af
Activating conda environment: /app/spritz/workflow/.snakemake/conda/a2cf4925
Removing temporary output file ../results/variants/combined.sorted.grouped.marked.split.recal.g.vcf.gz.
Removing temporary output file ../resources/tmp.
ri Sep  1 06:27:30 2023]
Finished job 13.
35 of 64 steps (55%) done

ri Sep  1 06:27:30 2023]
rule final_vcf_naming:
    input: ../results/variants/combined.sorted.grouped.marked.split.recal.g.gt.vcf
    output: ../results/variants/combined.spritz.vcf
    log: ../results/variants/final_vcf_naming.log
    jobid: 12
    wildcards: dir=../results

Activating conda environment: /app/spritz/workflow/.snakemake/conda/180357af
ri Sep  1 06:27:31 2023]
Finished job 12.
36 of 64 steps (56%) done

ri Sep  1 06:27:31 2023]
rule variant_annotation_ref:
    input: ../resources/SnpEff/data/Homo_sapiens.GRCh38/doneHomo_sapiens.GRCh38.txt, ../resources/SnpEff/snpEff.jar, ../resources/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.karyotypic.fa, ../results/variants/combined.spritz.vcf
    output: ../results/variants/combined.spritz.snpeff.vcf, ../results/variants/combined.spritz.snpeff.html, ../results/variants/combined.spritz.snpeff.genes.txt, ../results/variants/combined.spritz.snpeff.protein.fasta, ../results/variants/combined.spritz.snpeff.protein.xml
    log: ../results/variants/combined.spritz.snpeff.log
    jobid: 11
    benchmark: ../results/variants/combined.spritz.snpeff.benchmark
    wildcards: dir=../results
    resources: mem_mb=16000

Activating conda environment: /app/spritz/workflow/.snakemake/conda/47a7eecc
ri Sep  1 06:27:59 2023]
Finished job 53.
37 of 64 steps (58%) done

ri Sep  1 06:27:59 2023]
rule assemble_transcripts_fq:
    input: ../results/align/TK12_R1.fq.sorted.bam, ../resources/ensembl/Homo_sapiens.GRCh38.97.gff3
    output: ../results/isoforms/TK12_R1.fq.sorted.gtf, ../results/isoforms/TK12_R1.fq.sorted.gtf.gz
    log: ../results/isoforms/TK12_R1.fq.sorted.gtf.log
    jobid: 43
    benchmark: ../results/isoforms/TK12_R1.fq.sorted.gtf.benchmark
    wildcards: dir=../results, fq=TK12_R1
    threads: 4


ri Sep  1 06:27:59 2023]
rule make_isoform_quant_dataframe_ref:
    input: ../results/isoforms/TK12_R1_quant_ref/TK12_R1.fq.transcript.quant_ref.gtf, ../results/isoforms/TK12_R2_quant_ref/TK12_R2.fq.transcript.quant_ref.gtf, ../results/isoforms/TK12_R3_quant_ref/TK12_R3.fq.transcript.quant_ref.gtf
    output: ../results/final/transcript_reference_quant.tpms.csv
    log: ../results/final/transcript_reference_quant.tpms.log
    jobid: 51
    benchmark: ../results/final/transcript_reference_quant.tpms.benchmark
    wildcards: dir=../results

Activating conda environment: /app/spritz/workflow/.snakemake/conda/45a9cd02
Activating conda environment: /app/spritz/workflow/.snakemake/conda/a2cf4925
Removing temporary output file ../results/isoforms/TK12_R1_quant_ref/TK12_R1.fq.transcript.quant_ref.gtf.
Removing temporary output file ../results/isoforms/TK12_R2_quant_ref/TK12_R2.fq.transcript.quant_ref.gtf.
Removing temporary output file ../results/isoforms/TK12_R3_quant_ref/TK12_R3.fq.transcript.quant_ref.gtf.
ri Sep  1 06:28:13 2023]
Finished job 51.
38 of 64 steps (59%) done
ri Sep  1 06:31:38 2023]
Finished job 43.
39 of 64 steps (61%) done

ri Sep  1 06:31:38 2023]
rule make_gene_quant_dataframe_ref:
    input: ../results/isoforms/TK12_R1_quant_ref/TK12_R1.fq.gene.quant_ref.tab, ../results/isoforms/TK12_R2_quant_ref/TK12_R2.fq.gene.quant_ref.tab, ../results/isoforms/TK12_R3_quant_ref/TK12_R3.fq.gene.quant_ref.tab
    output: ../results/final/gene_reference_quant.tpms.csv
    log: ../results/final/gene_reference_quant.tpms.log
    jobid: 55
    benchmark: ../results/final/gene_reference_quant.tpms.benchmark
    wildcards: dir=../results

Activating conda environment: /app/spritz/workflow/.snakemake/conda/a2cf4925
ri Sep  1 06:31:39 2023]
Error in rule make_gene_quant_dataframe_ref:
    jobid: 55
    output: ../results/final/gene_reference_quant.tpms.csv
    log: ../results/final/gene_reference_quant.tpms.log (check log file(s) for error message)
    conda-env: /app/spritz/workflow/.snakemake/conda/a2cf4925
    shell:
        python scripts/SummarizeQuantTab.py ../results/final/gene_reference_quant.tpms.csv ../results/isoforms/TK12_R1_quant_ref/TK12_R1.fq.gene.quant_ref.tab ../results/isoforms/TK12_R2_quant_ref/TK12_R2.fq.gene.quant_ref.tab ../results/isoforms/TK12_R3_quant_ref/TK12_R3.fq.gene.quant_ref.tab &> ../results/final/gene_reference_quant.tpms.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

ri Sep  1 06:43:56 2023]
Finished job 11.
40 of 64 steps (62%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /app/spritz/workflow/.snakemake/log/2023-08-31T080720.442039.snakemake.log
[?1h=ERROR conda.cli.main_run:execute(49): `conda run dotnet SpritzCMD.dll --threads 6 --analysisDirectory=/app/spritz/results/ --reference=release-97,homo_sapiens,human,GRCh38 --analyzeVariants --analyzeIsoforms --doQuantification --fastq1=TK12_R1,TK12_R2,TK12_R3 --fastq2=TK12_R1,TK12_R2,TK12_R3` failed. (See above for error)
Error response from daemon: No such container: spritz1364950398
Done!
acesnik commented 10 months ago

The rule that failed involves a custom script to gather the results of quantification.

Could you please attach ../results/final/gene_reference_quant.tpms.log?

animesh commented 10 months ago

Yes of course, below is the content?

scripts/SummarizeQuantTab.py:15: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
  elif all(np.array(currIds) != ids): print("error with ids")
reading TK12_R1.fq.gene.quant_ref.tab
reading TK12_R2.fq.gene.quant_ref.tab
Traceback (most recent call last):
  File "scripts/SummarizeQuantTab.py", line 15, in <module>
    elif all(np.array(currIds) != ids): print("error with ids")
TypeError: 'bool' object is not iterable
./seqRNA/final/gene_reference_quant.tpms.log (END)
acesnik commented 10 months ago

Do you still have the files TK12_R1.fq.gene.quant_ref.tab and TK12_R2.fq.gene.quant_ref.tab? I wonder if they're empty.

I think this may fix the error, but if the files are empty, I'm not sure that pipeline will yield much, and there may be more issues downstream. https://github.com/smith-chem-wisc/Spritz/pull/238

animesh commented 10 months ago

They do have about 60K lines

(base) ash022@DMED7596:~/f/TK$ wc ./seqRNA/isoforms/TK12_R1_quant_ref/TK12_R1.fq.gene.quant_ref.tab
  60633  545699 4172198 ./seqRNA/isoforms/TK12_R1_quant_ref/TK12_R1.fq.gene.quant_ref.tab
(base) ash022@DMED7596:~/f/TK$ wc ./seqRNA/isoforms/TK12_R2_quant_ref/TK12_R2.fq.gene.quant_ref.tab
  60635  545717 4160498 ./seqRNA/isoforms/TK12_R2_quant_ref/TK12_R2.fq.gene.quant_ref.tab

not sure what are those extra 2 lines in R2 though?

acesnik commented 10 months ago

That is strange! They seem pretty similar, though, and might be usable. I'd suggest running the other workflows (variant & isoform) to let them finish, and then work on analysis using those two quantification files.

Please let me know if you figure out what the difference is if you figure it out. I have very limited time for this project, so I'll leave it there for now. If it makes your downstream analysis impossible, I'd be happy to try to help some more.

animesh commented 10 months ago

is there some documentation to follow so that i can run this all on a linux cluster without docker but singularity?

acesnik commented 10 months ago

Hi @animesh, Spritz doesn't have singularity capabilities yet. That'd be a welcome pull request if you know how to do it.

The way I've run it on the cluster is just to use Snakemake without containers, for which you can find instructions here, https://github.com/smith-chem-wisc/Spritz/wiki/Spritz-commandline-usage#spritz-commandline-usage.

KurlMurx commented 4 weeks ago

Hi, i am having a similar issue when running Spritz from the command line. I see that there is a branch and pull request for a fix at #238 and i was wondering if this can be used as a fix because it has not yet been merged to main.

acesnik commented 4 weeks ago

I merged it and made a new release. Let's see if that fixes the issue. 👍🏼

acesnik commented 4 weeks ago

Well, that didn't work. I'll get the release working by the end of the day and update you.

acesnik commented 4 weeks ago

The new release is available now. (The issues were just version checking.)

KurlMurx commented 4 weeks ago

Thank you so much! I will try this tomorrow in the morning and let you know :) I also didn't mean to make you do the merge, i was just wondering if i can checkout that branch and use the fix or if something on that branch was still missing. But this is even better

acesnik commented 2 weeks ago

Did the fix work for you?