Closed GuidoBarzaghi closed 4 years ago
Hi Guido,
something appears to be incorrect in rule STAR as I don't see the read files listed in the STAR command. Were your read files detected correctly? Also the genome is missing and a couple of other params.
Full STAR mapping command should look something like this:
STAR --runThreadN {threads} \
{params.alignerOptions} \
--sjdbOverhang 100 \
--outSAMunmapped Within \
--outSAMtype BAM Unsorted \
--outStd BAM_Unsorted \
--sjdbGTFfile {params.gtf} \
--genomeDir {params.index} \
--readFilesIn <(gunzip -c {input.r1}) <(gunzip -c {input.r2}) \
--outFileNamePrefix {params.prefix} \
| samtools sort -m {params.samsort_memory} -T $MYTEMP/{wildcards.sample} -@ {params.samtools_threads} -O bam -o {output.bam} - 2> {log}
rm -rf $MYTEMP
My apologies, the rule got cut while pasting. Here is the full one:
STAR --runThreadN 20 --sjdbOverhang 100 --outSAMunmapped Within --outSAMtype BAM Unsorted --outStd BAM_Unsorted --sjdbGTFfile /.../snakePipes_configs/organisms/GRCm38_gencode_release19/annotation/genes.gtf --genomeDir /.../snakePipes_configs/organisms/GRCm38_gencode_release19/STARIndex/SAindex --readFilesIn <(gunzip -c FASTQ/SRR2096209_R1.fastq.gz) <(gunzip -c FASTQ/SRR2096209_R2.fastq.gz) --outFileNamePrefix STAR/SRR2096209/SRR2096209. | samtools sort -m 2G -T $MYTEMP/SRR2096209 -@ 5 -O bam -o STAR/SRR2096209.sorted.bam - 2> STAR/logs/SRR2096209.sort.log rm -rf $MYTEMP
I believe the input files are detected correctly as they are also listed as input in the upstream rules.
However there seem to be a couple of errors popping up, e.g.:
ESC[33mlocalrules directive specifies rules that are not present in the Snakefile: sleuth_Salmon Salmon_wasabi ESC[0m ESC[33mJob counts: count jobs 1 FASTQ2 1ESC[0m
Or CLIs that I'm not sure are executing correctly, e.g.:
command line - /home/barzaghi/miniconda3/envs/snakePipes/lib/python3.8/site-packages/pulp/apis/../solverdir/cbc/linux/64/cbc 50970fce538948959dc5f5631e48e0fe-pulp.mps max ratio None allow None threads None presolve on strong None gomory on knapsack on probing on branch printingOptions all solution 50970fce538948959dc5f5631e48e0fe-pulp.sol (default strategy 1) At line 2 NAME MODEL At line 3 ROWS At line 7 COLUMNS At line 13 RHS At line 16 BOUNDS At line 18 ENDATA Problem MODEL has 2 rows, 1 columns and 2 elements
Perhaps I could send you the .log file for the whole pipeline to give you a global picture.
Just to be sure, this is how I'm launching the pipeline (from the file command.sh that comes with the test data)
mRNA-seq -c /.../miniconda3/envs/snakePipes/lib/python3.8/site-packages/snakePipes/workflows/mRNA-seq/defaults_CAST.yaml --sampleSheet sampleSheet.csv -m alignment --DAG -i . -o ./mRNAseq_test_out -j 6 mm10
If it helps I can also provide the .yaml files.
Many thanks for your support
Hi again,
this localrules directive specifies rules that are not present in the Snakefile: sleuth_Salmon Salmon_wasabi
is just a warning, you can ignore it.
That verbose command line message you're getting is coming from snakemake solver trying to optimize scheduling of jobs on your cluster. Perhaps there is an issue with your cluster config? Are you using slurm, which is our default, or did you update the cluster config (snakemake_cluster_cmd) to your local architecture?
Your mRNA-seq command looks fine.
Best,
Katarzyna
Ah I see. I am using SLURM as well with the following command:
snakemake_cluster_cmd: sbatch --ntasks-per-node=1 -c 3 -J {rule}.snakemake --mem-per-cpu={cluster.memory} -p htc -o {snakePipes_cluster_logDir}/{rule}.%j.out -e {snakePipes_cluster_logDir}/{rule}.%j.err
I didn't change anything else in cluster.yaml
Ok, I see you are forcing 3 threads on all the rules instead of using the rule-specific {threads}
directive.
Is passing -c {threads}
to snakemake_cluster_cmd an option for you?
If you would like to limit core usage by snakemake jobs, you could do it on the top level command e.g. by passing --snakemakeOptions ' --cores 3 '
to mRNA-seq.
You can also set rule-specific core limits in snakemake, see https://snakemake.readthedocs.io/en/stable/executing/cli.html .
Thanks for the suggestions :)
Unfortunately setting -c {threads}
resulted in the same error
you sure that your index files are correct and your fastq files are not empty or broken? Have you tried to run the shell script which fails separately? I wonder if there is something wrong with one of the inputs of that rule.
Index and fastq files seem healthy. I'm happy to try running the rule that fails (RNA_mapping.snakefile) in isolation. What should be the best way to do that?
This will run the rule locally on your machine - you may have to adapt the thread number under --runThreadN
.
You can also run the whole workflow locally by passing --local
to mRNA-seq.
Running STAR locally was successful, is it than a problem of the cluster.yaml?
It still looks like a problem with the cluster (configuration).
When you supplied -c {threads}
to the snakemake_cluster_cmd in your cluster.yaml, did you run snakePipes config
with --clusterConfig your_updated_cluster_config.yaml
or provide it directly to mRNA-seq via --clusterConfigFile our_updated_cluster_config.yaml
?
Does your $TMPDIR
exist on the cluster nodes and do you have enough space there?
I ran the pipeline with the updated cluster.yaml. Also the $TMPDIR
exists and is spacious enough for the test data
then try increasing the star memory in cluster.yaml
the __default__
one?
what i meant was that use your own cluster.yaml, I believe you can call it by --clusterConfigFile
. you can make a copy of the one your have under your output folder , rename it and update the value for STAR rule
Oh I see, unfortunately increasing the STAR memory to 30G did not solve the issue. This is how the updated cluster_config.yaml looks like
DESeq2:
memory: 5G
DESeq2_Salmon:
memory: 3G
FASTQdownsample:
memory: 3G
HISAT2:
memory: 6G
STAR:
memory: 30G
STAR_allele:
memory: 30G
SalmonIndex:
memory: 2G
SalmonQuant:
memory: 2G
Salmon_TPM:
memory: 5G
Salmon_counts:
memory: 5G
__default__
:
memory: 10G
annotation_bed2fasta:
memory: 4G
bamCoverage:
memory: 5G
bamCoverage_RPKM:
memory: 5G
bamCoverage_coverage:
memory: 5G
bamCoverage_raw:
memory: 5G
bamCoverage_unique_mappings:
memory: 5G
bamPE_fragment_size:
memory: 10G
create_annotation_bed:
memory: 4G
create_snpgenome:
memory: 30G
filter_reads_umi:
memory: 10G
plotCorrelation_pearson:
memory: 3G
plotCorrelation_pearson_allelic:
memory: 5G
plotCorrelation_spearman:
memory: 3G
plotCorrelation_spearman_allelic:
memory: 2G
plotCoverage:
memory: 1G
plotEnrichment:
memory: 1G
plotFingerprint:
memory: 1G
plotPCA:
memory: 4G
plotPCA_allelic:
memory: 4G
plot_heatmap_CSAW_up:
memory: 10G
sleuth_Salmon:
memory: 4G
snakePipes_cluster_logDir: /g/krebs/barzaghi/bash_scripts/slurm_err_out
snakemake_cluster_cmd: sbatch --ntasks-per-node=1 -c {threads} -J {rule}.snakemake
--mem-per-cpu={cluster.memory} -p htc -o /g/krebs/barzaghi/bash_scripts/slurm_err_out/{rule}.%j.out
-e /g/krebs/barzaghi/bash_scripts/slurm_err_out/{rule}.%j.err
snakemake_latency_wait: 300
snp_split:
memory: 10G
star_index:
memory: 15G
The format of your yaml file looks weird maybe it is github which changed the format, but if it worked with this one it did. apart from that you said that you used --local and the workflow worked for you all the way to the end right? So definitely a cluster issue, right? And the error is still the same ? you have nothing under cluster_log folder, have you?
Exactly right: --local allows a successful completion of the whole pipeline. I have logs under my snakePipes_cluster_logDir, but they just contain chunks of the whole workflow log. The ones under outDir/STAR/logs are empty though. The error message is the same:
Error executing rule STAR on cluster (jobid: 50, external: Submitted batch job 2693966, jobscript: /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/.snakemake/tmp.lpotnxy5/snakejob.STAR.50.sh). For error details see the cluster log and the log files of the involved rule(s). Job failed, going on with independent jobs. [Wed Sep 16 14:45:46 2020] Finished job 37. 26 of 80 steps (32%) done Exiting because a job execution failed. Look above for error message Complete log: /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/.snakemake/log/2020-09-16T143725.087438.snakemake.log
what a misterious issue ;)
Did the rules "upstream" of STAR mapping work, i.e. did you get symlinks of your original files to a "FASTQ" or "originalFASTQ" subfolder in your output folder? These are also sent to the cluster in the current version of snakePipes. Also, did you get filtered gtf and other files in the "Annotation" subfolder?
This is the structure of my output folder:
-rw-r--r-- 1 24227 718 123K Sep 16 14:45 mRNAseq_test_out/mRNA-seq_run-1.log -rw-r--r-- 1 24227 718 610 Sep 16 14:37 mRNAseq_test_out/mRNA-seq_tools.txt -rw-r--r-- 1 24227 718 2.3K Sep 16 14:37 mRNAseq_test_out/mRNA-seq_organism.yaml -rw-r--r-- 1 24227 718 1.4K Sep 16 14:37 mRNAseq_test_out/mRNA-seq.cluster_config.yaml -rw-r--r-- 1 24227 718 1.7K Sep 16 14:37 mRNAseq_test_out/mRNA-seq.config.yaml
mRNAseq_test_out/Annotation: total 26M -rw-r--r-- 1 24227 718 17M Sep 16 14:45 genes.filtered.bed -rw-r--r-- 1 24227 718 1.6M Sep 16 14:45 genes.filtered.symbol -rw-r--r-- 1 24227 718 6.6M Sep 16 14:45 genes.filtered.t2g lrwxrwxrwx 1 24227 718 113 Sep 16 14:40 genes.filtered.gtf -> /g/krebs/barzaghi/bash_scripts/utilies/snakePipes_configs/organisms/GRCm38_gencode_release19/annotation/genes.gtf
mRNAseq_test_out/FASTQ: total 48K lrwxrwxrwx 1 24227 718 92 Sep 16 14:41 SRR2096206_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096206_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:41 SRR2096209_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096209_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096207_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096207_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096208_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096208_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096211_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096211_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096207_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096207_R1.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096210_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096210_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096206_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096206_R1.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096211_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096211_R1.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096209_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096209_R1.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096208_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096208_R1.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096210_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096210_R1.fastq.gz
mRNAseq_test_out/originalFASTQ: total 0 lrwxrwxrwx 1 24227 718 61 Sep 16 14:41 SRR2096206_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096206_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:41 SRR2096209_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096209_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096211_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096211_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096208_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096208_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096207_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096207_R1.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096207_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096207_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096210_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096210_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096208_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096208_R1.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096211_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096211_R1.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096210_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096210_R1.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096209_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096209_R1.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096206_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096206_R1.fastq.gz
mRNAseq_test_out/STAR: total 4.0K drwxr-sr-x 2 24227 718 4.0K Sep 16 14:41 logs
I agree, this is quite puzzling
I'm not sure why mRNAseq_test_out/originalFASTQ
has zero size? Can you zcat mRNAseq_test_out/originalFASTQ/SRR2096206_R2.fastq.gz | head
?
How about the conda env STAR and samtools are installed into?
Is this path available on your cluster?
Can you source activate /home/barzaghi/miniconda3/envs/3adcc849 in some shell script you send to your cluster via sbatch?
And perhaps get output of STAR -h
and samtools --help
?
You can also send the full STAR/samtools command to your cluster , and see what you get in the logs.
That size is peculiar I agree, but the fastq file seems populated
@SRR2096206.1 39V34V1:192:H2TMTBCXX:1:1101:1750:2088 length=50 ATTCGAGCAGAATTAGGTCAACCAGGTGCACTTTTAGGAGATGACCAAAT +SRR2096206.1 39V34V1:192:H2TMTBCXX:1:1101:1750:2088 length=50 GGGGGIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIGIIGIGGIIIIIIIG @SRR2096206.2 39V34V1:192:H2TMTBCXX:1:1101:1617:2153 length=50 CCCATGCAGCCAACACAGTCGTTTACAGCTCTAACAAAATAGACGATACT +SRR2096206.2 39V34V1:192:H2TMTBCXX:1:1101:1617:2153 length=50 GGGGGIIIIIIIIIIIIGIIIIIIGGGIIIGGGGGGIIIIIIGIIIIIII @SRR2096206.3 39V34V1:192:H2TMTBCXX:1:1101:2041:2229 length=50 GCATGACTAAACTACAGCTCATATTCCACAAATTTGAGATTTGTCTTGCC
From STAR -h
and samtools --help
I'm getting the expected usage messages for STAR version 2.6.0c and samtools version 1.9
Launching the rule on the cluster however returns the following
/var/spool/slurm/job2837334/slurm_script: line 12: activate: No such file or directory mktemp: failed to create directory via template ‘/tmp/barzaghi/snakepipes.XXXXXXXXXX’: No such file or directory gzip: FASTQ/SRR2096206_R2.fastq.gz: No such file or directory gzip: FASTQ/SRR2096206_R1.fastq.gz: No such file or directory /var/spool/slurm/job2837334/slurm_script: line 17: STAR/logs/SRR2096206.sort.log: No such file or directory
activate: No such file or directory
suggest that you are missing the anaconda directory on the node you submit your job. This you most probably need to sort out with your IT people.
Like @LeilyR said, looks like the path to your conda installation is not available/mounted on your cluster nodes. Perhaps your IT can help you with that, or you can install conda for your user into a path that is available there.
The first rules of the workflow that deal with fastq files and annotation do not call for rule-specific conda environments, but use bash and python. The mapping rule that calls STAR requires activating a dedicated conda env.
I see.
Actually I managed to successfully launch STAR on the cluster by source activating /.../miniconda3/envs/snakePipes rather than /.../miniconda3/envs/3adcc849.
Now, evidently there is something I do not understand here: how come the pipeline uses what seems to me as temporary envs (e.g. 3adcc849) rather than the generic snakePipe env? Is this behaviour expected?
Hi Guido,
The generic snakePipes env has only python, snakemake, and python modules required by the snakePipes python wrappers in it.
Any software packages for NGS data processing, such as STAR etc., are installed into a limited number of conda environments, that you either create with snakePipes createEnvs
or they are automatically created the first time you run a workflow.
These rule/workflow-specific conda environments are automatically named by snakemake using a hash of the environment yaml file, hence the not so user-friendly names.
You can get paths to these envs and what they correspond to by running snakePipes envInfo
.
In any case, you need to source activate the path to your generic snakePipes env before running any full workflow, to make the paths to python wrappers, snakemake etc. available. The rule-specific envs are automatically source activated by snakemake when executing a rule that requires them. I thought the first time you ran your full workflow on the cluster, you source activated the generic env beforehands, is that correct?
To test the code for one specific rule that requires a specific conda env, you don't need to source activate the generic snakePipes env, but rather the rule-specific environment, as you don't need snakemake or python wrappers to execute one isolated shell command.
Hope this helps,
Best,
Katarzyna
Thanks a lot for the explanation, it helps a lot :)
Actually that might be the problem as the mRNAseq specific env (3adcc849) is not listed by snakePipes envInfo
envs/shared.yaml is in: /home/barzaghi/miniconda3/envs/42723b8521a92932b1b8b0035b45c8c2
envs/createIndices.yaml is in: /home/barzaghi/miniconda3/envs/881bf80b50ffdee8cd04c2b4c79f4be2
envs/rna_seq.yaml is in: /home/barzaghi/miniconda3/envs/2e4ce781da5529cfc5da85847031ab60
envs/sc_rna_seq.yaml is in: /home/barzaghi/miniconda3/envs/c05670c9a3b222949368cc85ea77aa12
envs/sc_rna_seq_seurat3.yaml is in: /home/barzaghi/miniconda3/envs/4d22286fbdfcd6dcae66f8af5f55dd7c
envs/sc_rna_seq_loompy.yaml is in: /home/barzaghi/miniconda3/envs/8117fc9e197a9ded763e01ce6b2ee47e
envs/dna_mapping.yaml is in: /home/barzaghi/miniconda3/envs/0e10073fc47605bf3465f4d53bf0c81b
envs/chip_seq.yaml is in: /home/barzaghi/miniconda3/envs/448dc2f247fbc9275f3a9700ac7ff1a0
envs/histone_hmm.yaml is in: /home/barzaghi/miniconda3/envs/dd7efb1338a71eb4543b0411e2ed5f2d
envs/atac_seq.yaml is in: /home/barzaghi/miniconda3/envs/3b3bc60670b29e65e68025a552b92002
envs/hic.yaml is in: /home/barzaghi/miniconda3/envs/7a30c13519bd32ad9a202991d4b6c84b
envs/wgbs.yaml is in: /home/barzaghi/miniconda3/envs/321b94c5a4aa2974fd8511a1ea4a0c47
envs/rmarkdown.yaml is in: /home/barzaghi/miniconda3/envs/24a49c7d518641700962e3eb041c3ffe
envs/preprocessing.yaml is in: /home/barzaghi/miniconda3/envs/716ca2a4d885c02ad64286112987e524
envs/noncoding.yaml is in: /home/barzaghi/miniconda3/envs/2e5ae8bdd9833fe90c63eec535502a60
envs/sambamba.yaml is in: /home/barzaghi/miniconda3/envs/8792566abd75d113cc3e56ce290737cc
but is present in /home/barzaghi/miniconda3/envs/
drwxr-xr-x 21 24227 718 4.0K Sep 8 18:31 4c0f0ebc -rw-rw-r-- 1 24227 718 298 Sep 8 18:27 4c0f0ebc.yaml drwxr-xr-x 17 24227 718 4.0K Sep 8 18:27 3adcc849 -rw-rw-r-- 1 24227 718 494 Sep 8 18:23 3adcc849.yaml drwxr-xr-x 4 24227 718 4.0K Sep 8 18:23 66a495d8 -rw-rw-r-- 1 24227 718 102 Sep 8 18:22 66a495d8.yaml drwxr-xr-x 4 24227 718 4.0K Aug 26 17:37 8792566abd75d113cc3e56ce290737cc drwxr-xr-x 16 24227 718 4.0K Aug 26 17:37 2e5ae8bdd9833fe90c63eec535502a60 drwxr-xr-x 18 24227 718 4.0K Aug 26 17:33 716ca2a4d885c02ad64286112987e524 drwxr-xr-x 16 24227 718 4.0K Aug 26 17:32 24a49c7d518641700962e3eb041c3ffe drwxr-xr-x 16 24227 718 4.0K Aug 26 17:29 321b94c5a4aa2974fd8511a1ea4a0c47 drwxr-xr-x 23 24227 718 4.0K Aug 26 17:25 7a30c13519bd32ad9a202991d4b6c84b drwxr-xr-x 19 24227 718 4.0K Aug 26 17:20 3b3bc60670b29e65e68025a552b92002 drwxr-xr-x 16 24227 718 4.0K Aug 26 17:17 dd7efb1338a71eb4543b0411e2ed5f2d drwxr-xr-x 16 24227 718 4.0K Aug 26 17:15 448dc2f247fbc9275f3a9700ac7ff1a0 drwxr-xr-x 20 24227 718 4.0K Aug 26 17:13 0e10073fc47605bf3465f4d53bf0c81b drwxr-xr-x 11 24227 718 4.0K Aug 26 17:08 8117fc9e197a9ded763e01ce6b2ee47e drwxr-xr-x 16 24227 718 4.0K Aug 26 17:07 4d22286fbdfcd6dcae66f8af5f55dd7c drwxr-xr-x 16 24227 718 4.0K Aug 26 17:03 c05670c9a3b222949368cc85ea77aa12 drwxr-xr-x 17 24227 718 4.0K Aug 26 16:55 2e4ce781da5529cfc5da85847031ab60 drwxr-xr-x 9 24227 718 4.0K Aug 26 16:49 881bf80b50ffdee8cd04c2b4c79f4be2 drwxr-xr-x 21 24227 718 4.0K Aug 26 16:48 42723b8521a92932b1b8b0035b45c8c2 drwxr-xr-x 16 24227 718 4.0K Aug 26 15:57 snakePipes drwxr-xr-x 11 24227 718 4.0K Jul 29 14:50 py2
I guess I must have done something wrong the first time I ran the pipeline. Would deleting these environments are launching the pipeline solve the issue?
Very strange, there is a difference in hash between the environment name listed by snakePipes envInfo
(2e4ce781da5529cfc5da85847031ab60) and the one that snakemake is activating for your mRNA-seq workflow (3adcc849).
Different hash suggest a change in the environment yaml file content after installation and snakePipes config
, or a change in its path. Is that possible ?
Can it be that there is some mixup in snakePipes versions or installation paths on your system?
The following should return consistent results:
source activate your_generic_snakePipes_env
snakePipes config --snakemakeOptions ' --use-conda --conda-prefix /home/barzaghi/miniconda3/envs '
snakePipes createEnvs
snakePipes envInfo`
On our side, this is working consistently.
This doesn't necessarily explain the issue with STAR on the cluster, as long as 3adcc849
is build correctly.
You can try to clean up both the snakePipes installation and the environments, perhaps it this helps.
Sorry for the wait.
So I saw there was an updated version of snakePipes so I basically reinstalled the whole thing. Now the environments I find under miniconda correspond to the ones returned by snakePipes envInfo
. Unfortunately I still got the same cryptic error while launching the pipeline on the test data. Upon instead launching the star rule only as follows
source /g/krebs/barzaghi/utilies/miniconda3/etc/profile.d/conda.sh
conda activate /g/krebs/barzaghi/utilies/miniconda3/envs/203a38e6911ca794da2f33c64c7233e5
cd /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out
TMPDIR=/tmp/barzaghi
MYTEMP=$(mktemp -d ${TMPDIR:-/tmp}/snakepipes.XXXXXXXXXX)
( [ -d STAR/SRR2096211 ] || mkdir -p STAR/SRR2096211 )
STAR --runThreadN 20 --sjdbOverhang 100 --outSAMunmapped Within --outSAMtype BAM Unsorted --outStd BAM_Unsorted --sjdbGTFfile /g/krebs/barzaghi/DB/genomes/snakePipes_indexes/GRCm38_gencode_release19/annotation/genes.gtf --genomeDir /g/krebs/barzaghi/DB/genomes/snakePipes_indexes/GRCm38_gencode_release19/STARIndex/ --readFilesIn <(gunzip -c FASTQ/SRR2096211_R1.fastq.gz) <(gunzip -c FASTQ/SRR2096211_R2.fastq.gz) --outFileNamePrefix STAR/SRR2096211/SRR2096211. | samtools sort -m 2G -T $MYTEMP/SRR2096211 -@ 5 -O bam -o STAR/SRR2096211.sorted.bam - 2> STAR/logs/SRR2096211.sort.log
rm -rf $MYTEMP
I got the following error:
EXITING because of FATAL ERROR: Genome version: 20201 is INCOMPATIBLE with running STAR version: 2.7.4a SOLUTION: please re-generate genome from scratch with running version of STAR, or with version: 2.7.4a
Sep 24 11:30:46 ...... FATAL ERROR, exiting
Could it be something like the issue reported here
Hi again,
so yes, indeed, STAR devs have changed the genome index format a couple of times already, such that a given format is only working with a couple of minor/bugfix versions. 2.7.4a requires a new genome build compared to the previous version.
From release 2.2.0, snakePipes uses STAR version 2.7.4.a , as it brings in new functionality for scRNAseq analysis. That means rerunning createIndices with snakePipes version 2.2.0 or later (https://github.com/maxplanck-ie/snakepipes/releases/tag/2.2.0).
I see, I will then run createIndices :)
Unfortunately, recreating the indices runs into the same kind of problems as the RNAseq pipeline. I would like to try running either pipelines after loading the anaconda module on the cluster nodes (it has to be loaded separately for each rule I assume). Could you guide me towards the right place to add the module load anaconda
line in the code?
https://github.com/maxplanck-ie/snakepipes/blob/master/snakePipes/shared/rules/RNA_mapping.snakefile#L92
under the shell
if you want to add it to the star rule.
Perhaps module loading anaconda with each rule is a bit of a last resort?
Did you try something more generic as e.g.
module load anaconda
source activate snakePipes
mRNA-seq -i input -o output
or, say modifying your snakemake cmd in cluster.yaml to look like:
module load anaconda; sbatch ...
I am very happy to report that adding module load anaconda
to the cluster.yaml worked :) I managed to run successfully both the createIndices and RNAseq pipelines.
I can never thank you enough for your exceptional support and great work on developing snakePipes.
Great, I'm glad to hear that! Thanks for your patience with the issue :)
Hello!
I am trying to run the mRNAseq pipeline on the provided test data, but I can't seem to be able to get past the following error. Unfortunately the STAR .log is empty.
Error in rule STAR: jobid: 52 output: STAR/SRR2096208.sorted.bam log: STAR/logs/SRR2096208.sort.log (check log file(s) for error message) conda-env: /home/barzaghi/miniconda3/envs/3adcc849 shell:
Do you have any suggestions?