nf-core / rnasplice

rnasplice is a bioinformatics pipeline for RNA-seq alternative splicing analysis
https://nf-co.re/rnasplice
MIT License
44 stars 24 forks source link

Miso error #134

Closed BulutHamali closed 5 months ago

BulutHamali commented 5 months ago

Description of the bug

I am constantly getting this error. It is already mentioned in other issues, but I was not able to fix it. Any help will be appreciated. Thanks. " The exit status of the task that caused the workflow execution to fail was: 1

Error executing process > 'NFCORE_RNASPLICE:RNASPLICE:VISUALISE_MISO:MISO_SASHIMI (2)'

Caused by: Process NFCORE_RNASPLICE:RNASPLICE:VISUALISE_MISO:MISO_SASHIMI (2) terminated with an error exit status (1)

Command executed:

sashimi_plot --plot-event ENSG00000005302.19 index miso_settings.txt --output-dir sashimi

cat <<-END_VERSIONS > versions.yml "NFCORE_RNASPLICE:RNASPLICE:VISUALISE_MISO:MISO_SASHIMI": python: $(python --version | sed "s/Python //g") misopy: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('misopy').version)") END_VERSIONS

Command exit status: 1

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred /usr/local/lib/python2.7/site-packages/matplotlib/cbook/deprecation.py:107: MatplotlibDeprecationWarning: The mpl_toolkits.axes_grid module was deprecated in version 2.1. Use mpl_toolkits.axes_grid1 and mpl_toolkits.axisartist provies the same functionality instead. warnings.warn(message, mplDeprecation, stacklevel=1) Traceback (most recent call last): File "/usr/local/bin/sashimi_plot", line 11, in sys.exit(main()) File "/usr/local/lib/python2.7/site-packages/misopy/sashimi_plot/sashimi_plot.py", line 276, in main plot_label=plot_label) File "/usr/local/lib/python2.7/site-packages/misopy/sashimi_plot/sashimi_plot.py", line 142, in plot_event %(event_name, pickle_dir) Exception: Event ENSG00000005302.19 not found in pickled directory index. Are you sure this is the right directory for the event?

Work dir: /home/hamalibt/Splicesome_Project/RNA_SEQ_ARGLU1KO_VS_MCF7WT/work/6a/df958b424331c70ec5aaa5eeb10caf

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh"

Command used and terminal output

nextflow run nf-core/rnasplice \
    -profile singularity \
    --input /path/to/samplesheet.csv \
    --contrasts /path/to/contrastsheet.csv \
    --outdir /path/to/output_directory \
    --genome GRCh38 \
    --aligner star_salmon \
    --save_reference \
    --dexseq_dtu \
    --min_samps_gene_expr 6 \
    --min_gene_expr 10 \
    --min_samps_feature_expr 3 \
    --min_feature_expr 10 \
    --min_samps_feature_prop 3 \
    --min_feature_prop 0.1

Relevant files

nextflow.log

System information

N E X T F L O W ~ version 23.10.1, local, Apptainer

jma1991 commented 5 months ago

The error you're encountering likely stems from using a gene identifier that isn't included in your annotation. You've chosen GRCh38 as your genome parameter, which directs the workflow to pull the necessary annotation files from the iGenomes repository. You can verify this by downloading the GTF file linked in the configuration and checking for your gene identifier—you'll find it's missing. In fact, the GRCh38 annotation from iGenomes doesn't include Ensembl identifiers, a known issue across all nf-core workflows that has been discussed before. If you need to use the GRCh38 genome, I recommend providing your own FASTA and GTF files for the annotation.

BulutHamali commented 5 months ago

Thank you very much. I later on used this command "nextflow run nf-core/rnasplice \ -profile singularity \ --input /path/to/samplesheet.csv \ --contrasts /path/to/contrastsheet.csv \ --outdir /path/to/output_directory \ --genome GRCh38 \ --aligner star_salmon \ --save_reference \ --dexseq_dtu \ --min_samps_gene_expr 6 \ --min_gene_expr 10 \ --min_samps_feature_expr 3 \ --min_feature_expr 10 \ --min_samps_feature_prop 3 \ --min_feature_prop 0.1 \ --fasta /path/to/reference_genome.fa \ --gtf /path/to/annotation_file.gtf.gz \ -resume " and I got this error then"The exit status of the task that caused the workflow execution to fail was: 105

Error executing process > 'NFCORE_RNASPLICE:RNASPLICE:ALIGN_STAR:STAR_ALIGN (LC6_S6)'

Caused by: Process NFCORE_RNASPLICE:RNASPLICE:ALIGN_STAR:STAR_ALIGN (LC6_S6) terminated with an error exit status (105)

Command executed:

STAR \ --genomeDir STARIndex \ --readFilesIn input1/LC6_S6_trimmed.fq.gz \ --runThreadN 12 \ --outFileNamePrefix LC6_S6. \ \ --sjdbGTFfile Homo_sapiens.GRCh38.104.gtf \ --outSAMattrRGline 'ID:LC6_S6' 'SM:LC6_S6' \ --quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand gunzip -c --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantTranscriptomeBan Singleend

if [ -f LC6_S6.Unmapped.out.mate1 ]; then mv LC6_S6.Unmapped.out.mate1 LC6_S6.unmapped_1.fastq gzip LC6_S6.unmapped_1.fastq fi if [ -f LC6_S6.Unmapped.out.mate2 ]; then mv LC6_S6.Unmapped.out.mate2 LC6_S6.unmapped_2.fastq gzip LC6_S6.unmapped_2.fastq fi

cat <<-END_VERSIONS > versions.yml "NFCORE_RNASPLICE:RNASPLICE:ALIGN_STAR:STARALIGN": star: $(STAR --version | sed -e "s/STAR//g") samtools: $(echo $(samtools --version 2>&1) | sed 's/^.samtools //; s/Using.$//') gawk: $(echo $(gawk --version 2>&1) | sed 's/^.GNU Awk //; s/, .$//') END_VERSIONS

Command exit status: 105

Command output: STAR --genomeDir STARIndex --readFilesIn input1/LC6_S6_trimmed.fq.gz --runThreadN 12 --outFileNamePrefix LC6_S6. --sjdbGTFfile Homo_sapiens.GRCh38.104.gtf --outSAMattrRGline ID:LC6_S6 SM:LC6_S6 --quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand gunzip -c --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantTranscriptomeBan Singleend STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source May 07 16:08:24 ..... started STAR run May 07 16:08:24 ..... loading genome

Command error: INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred STAR --genomeDir STARIndex --readFilesIn input1/LC6_S6_trimmed.fq.gz --runThreadN 12 --outFileNamePrefix LC6_S6. --sjdbGTFfile Homo_sapiens.GRCh38.104.gtf --outSAMattrRGline ID:LC6_S6 SM:LC6_S6 --quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand gunzip -c --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantTranscriptomeBan Singleend STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source May 07 16:08:24 ..... started STAR run May 07 16:08:24 ..... loading genome

EXITING because of FATAL ERROR: Genome version: 20201 is INCOMPATIBLE with running STAR version: 2.7.9a SOLUTION: please re-generate genome from scratch with running version of STAR, or with version: 2.7.4a

May 07 16:08:24 ...... FATAL ERROR, exiting

Work dir: /home/hamalibt/Splicesome_Project/RNA_SEQ_ARGLU1KO_VS_MCF7WT/work/2e/8d479b2a30ee2730d2b11db1c60837

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh"

jma1991 commented 5 months ago

This is another widespread issue. Basically, the index stored on iGenomes is incompatible with the version of the STAR aligner in the nf-core modules. See the somewhat short discussion here. The solution, as mentioned earlier is to provide your own FASTA and GTF file so the workflow generates an index which is compatible.

BulutHamali commented 5 months ago

So I downloaded the human genome from a custom repository""https://github.com/ewels/AWS-iGenomes" and used the following Nextflow command with nf-core's rnasplice pipeline "nextflow run nf-core/rnasplice -profile singularity --input /samplesheet.csv --contrasts /contrastsheet.csv --outdir /Master_Directory/MCF7/05062024 --genome GRCh37 --aligner star_salmon --save_reference --dexseq_dtu --min_samps_gene_expr 6 --min_gene_expr 10 --min_samps_feature_expr 3 --min_feature_expr 10 --min_samps_feature_prop 3 --min_feature_prop 0.1 --fasta /genome.fa --gtf /genes.gtf " I even configured nextflow.config for MISO gene extensions, but encountered this error:"Workflow execution completed unsuccessfully! The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'NFCORE_RNASPLICE:RNASPLICE:VISUALISE_MISO:MISO_SASHIMI (2)'

Caused by: Process NFCORE_RNASPLICE:RNASPLICE:VISUALISE_MISO:MISO_SASHIMI (2) terminated with an error exit status (1)

Command executed:

sashimi_plot --plot-event ENSG00000005302.19 index miso_settings.txt --output-dir sashimi

cat <<-END_VERSIONS > versions.yml "NFCORE_RNASPLICE:RNASPLICE:VISUALISE_MISO:MISO_SASHIMI": python: $(python --version | sed "s/Python //g") misopy: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('misopy').version)") END_VERSIONS

Command exit status: 1

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred /usr/local/lib/python2.7/site-packages/matplotlib/cbook/deprecation.py:107: MatplotlibDeprecationWarning: The mpl_toolkits.axes_grid module was deprecated in version 2.1. Use mpl_toolkits.axes_grid1 and mpl_toolkits.axisartist provies the same functionality instead. warnings.warn(message, mplDeprecation, stacklevel=1) Traceback (most recent call last): File "/usr/local/bin/sashimi_plot", line 11, in sys.exit(main()) File "/usr/local/lib/python2.7/site-packages/misopy/sashimi_plot/sashimi_plot.py", line 276, in main plot_label=plot_label) File "/usr/local/lib/python2.7/site-packages/misopy/sashimi_plot/sashimi_plot.py", line 142, in plot_event %(event_name, pickle_dir) Exception: Event ENSG00000005302.19 not found in pickled directory index. Are you sure this is the right directory for the event?

Work dir: /home/hamalibt/my_refs/work/90/89430a1a23400ef603a22d59c6a7b6

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh"

jma1991 commented 5 months ago

Can you confirm the gene identifier you’re using is present in the annotation file? Just the output of grep would be sufficient.

BulutHamali commented 5 months ago

I've tried various commands like 'grep', 'grep -m 1', 'grep -F', 'LC_ALL=C grep', './search', and 'awk' to search for 'ENSG00000005302.19' in 'genes.gtf', but after about half an hour, there's still no output. Is this normal?

jma1991 commented 5 months ago

Please try the following command:

grep "ENSG00000005302" genes.gtf

If there is no output, it means your gene identifier is not in the annotation file.

BulutHamali commented 5 months ago

It worked out afterwards. It seems that the issue with the 'grep' command was likely related to an HPC problem. Thank you.

torres-HI commented 2 months ago

Hi,

I have same error with MISO plotting

I read that the issue is the notation of the reference genome. My question is what will be the best reference genome for work with mouse. am used ENSEMBL 38, because is suppose it has all the annotated transcript, at difference of UCSC that is better for epigenomic issues (atac, chip-seq, etc) but not have the transcript annotation.

I need to mach the splicing + rna expression and atac-seq

I'm listening suggestion, thank you

torres-HI commented 2 months ago

Hi again, where I can find the list of the genes with alternative splicing, to add the process? thank you