langmead-lab / monorail-external

examples to run monorail externally
MIT License
13 stars 5 forks source link

hg38 reference genome input errors #8

Closed paulranum11 closed 2 years ago

paulranum11 commented 2 years ago

Dear Langmead-lab,

I am attempting to run monorail as described in the README.md but have run into errors related to the hg38 reference genome.

I started by running the following command. /bin/bash \ /home/paulranum11/Desktop/monorail-external/singularity/run_recount_pump.sh /home/paulranum11/Desktop/monorail-external/recount-rs5_1.0.8.sif SRR390728 \ local \ hg38 \ 2 \ /home/paulranum11/Desktop/monorail-external/ \ /home/paulranum11/Desktop/monorail-external/SRR390728_1.fastq.gz \ /home/paulranum11/Desktop/monorail-external/SRR390728_2.fastq.gz \ SRP020237

This run terminates with the following error: Building DAG of jobs... MissingInputException in line 837 of /Snakefile: Missing input files for rule align: /container-mounts/recount/ref/hg38/star_idx/SAindex ++ fgrep 'steps (100%) done' /container-mounts/recount/output/std.out + done= <br>

I have attempted several variations on this run, for example editing the "hg38" argument to "./hg38" and replacing the STAR genome in the supplied hg38 directory with a newly built STAR genome from the grch38 genome build. Each time the error changes slightly indicating that the issue is indeed with the reference. But i don't seem to be able to resolve this issue on my own. Hopefully you will recognize what is going wrong.

The following is the contents of the star index download as described in the README.md (base) paulranum11@paulranum11-G707:~/Desktop/monorail-external$ ls -tslh hg38/star_idx total 25G 22G -rw-r--r-- 1 paulranum11 paulranum11 22G Nov 29 09:29 SA 3.0G -rw-r--r-- 1 paulranum11 paulranum11 3.0G Oct 8 2019 Genome 12K -rw-r--r-- 1 paulranum11 paulranum11 623 Oct 8 2019 genomeParameters.txt 12K -rw-r--r-- 1 paulranum11 paulranum11 1.7K Oct 8 2019 chrLength.txt 16K -rw-r--r-- 1 paulranum11 paulranum11 5.9K Oct 8 2019 chrNameLength.txt
16K -rw-r--r-- 1 paulranum11 paulranum11 4.3K Oct 8 2019 chrName.txt 12K -rw-r--r-- 1 paulranum11 paulranum11 3.2K Oct 8 2019 chrStart.txt

Happy to provide more details as needed, thanks for any help!

paulranum11 commented 2 years ago

It appears that the genome provided through the get_human_ref_indexes.sh script is missing several expected STAR genome components like the SAindex file. Is this by design? When I create a STAR genome from a downloaded GRCh38.fa file on my system using STAR --runMode genomeGenerate the following list of files are generated. chrLength.txt chrNameLength.txt
chrName.txt
chrStart.txt
exonGeTrInfo.tab
exonInfo.tab geneInfo.tab
Genome
genomeParameters.txt
Log.out
SA
SAindex
sjdbInfo.txt
sjdbList.fromGTF.out.tab
sjdbList.out.tab transcriptInfo.tab

Is the genome accessible from get_human_ref_indexes.sh missing required files?

paulranum11 commented 2 years ago

After re-dowloading the provided index files with the get_human_ref_indexes.sh i now see the SAindex file and subsequent STAR alignment is working. This issue appears to be resolved. My apologies if this was all due to user error or an incomplete initial download.