snakemake-workflows / rna-seq-star-deseq2

RNA-seq workflow using STAR and DESeq2
MIT License
319 stars 196 forks source link

`Align` step fails with `could not open input file /geneInfo.tab` #83

Open benostendorf opened 3 months ago

benostendorf commented 3 months ago

I have deployed the latest version of this workflow (v2.1.2) and only adjusted the config files. I use the ENSEMBL GRCm38 mouse reference genome (release 102).
Unfortunately, the mapping step fails with the following error:

Transcriptome.cpp:18:Transcriptome: exiting because of *INPUT FILE* error: could not open input file /geneInfo.tab
Solution: check that the file exists and you have read permission for this file
          SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step

Jun 06 19:42:37 ...... FATAL ERROR, exiting

This seems to be the same error as reported here. Thank you for looking into this!

dlaehnemann commented 3 months ago

From what I understand from the issue you linked, this is a bug in STAR and we will have to wait for the fix that the maintainer of STAR has already created (and links to in the bug report) is properly released (in the main STAR repository, not the separate pre-release repository). You could ask the maintainer in the issue to create a new release, noting that you also need this. Once he does this, I am glad to help with the necessary follow-up, which will be:

  1. Release the new version on bioconda.
  2. Update the STAR wrapper(s) in the snakemake wrappers repository.
  3. Switch to the newest STAR wrapper versions in the workflow.

The only other possibility I see, if the maintainer doesn't create a release soon, would be to patch up the bioconda recipe with bugfix. This could be very straightforward, but I don't currently have the bandwidth to investigate this further. But feel free to ping me with questions regarding this, if you want to pursue that solution further.

benostendorf commented 3 months ago

Thanks for the prompt reply! I asked for creation of a new release, hopefully this will happen soon.

dlaehnemann commented 3 months ago

Maybe, just to be sure, a two more questions:

  1. Did you just use the automatic GTF download that the workflow does, or did you manually put in place a GTF file. I guess that mismatches due to a manually added GTF might lead to issues.
  2. And in either case, could you check that the GTF file contains the gene_id tag? The maintainer was mentioning this as a requirement in the issue you linked to: https://github.com/alexdobin/STAR/issues/1953#issuecomment-1727749459

Because you never know, the error could be triggered by something different than the bug reported there. And also, the GitHub Actions continuous integration tests here just ran through these past days with a little fix to the STAR rules. So it's a bit funny that this fails on your end.

benostendorf commented 3 months ago

Yes, I used the automatic GTF download and there is a column where entries start with the gene_id tag. These are the first few lines from the gtf file:

#!genome-build GRCm38.p6                                
#!genome-version GRCm38                             
#!genome-date 2012-01                               
#!genome-build-accession NCBI:GCA_000001635.8                               
#!genebuild-last-updated 2020-02                                
1   havana  gene    3073253 3074322 .   +   .   gene_id "ENSMUSG00000102693"; gene_version "1"; gene_name "4933401J01Rik"; gene_source "havana"; gene_biotype "TEC"; havana_gene "OTTMUSG00000049935"; havana_gene_version "1";

On 7. Jun 2024, at 10:25, David Laehnemann @.***> wrote:

Maybe, just to be sure, a two more questions:

Did you just use the automatic GTF download that the workflow does, or did you manually put in place a GTF file. I guess that mismatches due to a manually added GTF might lead to issues. And in either case, could you check that the GTF file contains the gene_id tag? The maintainer was mentioning this as a requirement in the issue you linked to: alexdobin/STAR#1953 (comment) https://github.com/alexdobin/STAR/issues/1953#issuecomment-1727749459 Because you never know, the error could be triggered by something different than the bug reported there. And also, the GitHub Actions continuous integration tests here just ran through these past days with a little fix to the STAR rules. So it's a bit funny that this fails on your end.

— Reply to this email directly, view it on GitHub https://github.com/snakemake-workflows/rna-seq-star-deseq2/issues/83#issuecomment-2154345909, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFKDNGRWKEICH3SF2V3KM3TZGFVBNAVCNFSM6AAAAABI5GIHR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJUGM2DKOJQHE. You are receiving this because you authored the thread.

dlaehnemann commented 3 months ago

OK, at least this is not broken, then... :sweat_smile:

So it is probably some weird interaction of this bug with the software stack... :shrug:

Fingers crossed, that the release of the bug fix makes this go away.

Otherwise, we more regularly use the rna-seq-kallisto-sleuth workflow ourselves, and maintain and improve it constantly. So maybe this is an alternative you might want to try for your RNAseq Analysis: https://snakemake.github.io/snakemake-workflow-catalog/?repo=snakemake-workflows/rna-seq-kallisto-sleuth