Error in rule STAR with empty .log

GuidoBarzaghi commented 4 years ago

Hello!

I am trying to run the mRNAseq pipeline on the provided test data, but I can't seem to be able to get past the following error. Unfortunately the STAR .log is empty.

Error in rule STAR: jobid: 52 output: STAR/SRR2096208.sorted.bam log: STAR/logs/SRR2096208.sort.log (check log file(s) for error message) conda-env: /home/barzaghi/miniconda3/envs/3adcc849 shell:

            TMPDIR=/tmp/barzaghi
            MYTEMP=$(mktemp -d ${TMPDIR:-/tmp}/snakepipes.XXXXXXXXXX)
            ( [ -d STAR/SRR2096208 ] || mkdir -p STAR/SRR2096208 )
            STAR --runThreadN 20                                          --sjdbOverhang 100                     --outSAMunmapped Within                     --outSAMtype BAM Unsorted                     --outSt
            rm -rf $MYTEMP

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: Submitted batch job 1285986

Do you have any suggestions?

katsikora commented 4 years ago

Hi Guido,

something appears to be incorrect in rule STAR as I don't see the read files listed in the STAR command. Were your read files detected correctly? Also the genome is missing and a couple of other params.

Full STAR mapping command should look something like this:

STAR --runThreadN {threads} \ {params.alignerOptions} \ --sjdbOverhang 100 \ --outSAMunmapped Within \ --outSAMtype BAM Unsorted \ --outStd BAM_Unsorted \ --sjdbGTFfile {params.gtf} \ --genomeDir {params.index} \ --readFilesIn <(gunzip -c {input.r1}) <(gunzip -c {input.r2}) \ --outFileNamePrefix {params.prefix} \ | samtools sort -m {params.samsort_memory} -T $MYTEMP/{wildcards.sample} -@ {params.samtools_threads} -O bam -o {output.bam} - 2> {log} rm -rf $MYTEMP

GuidoBarzaghi commented 4 years ago

My apologies, the rule got cut while pasting. Here is the full one:

STAR --runThreadN 20 --sjdbOverhang 100 --outSAMunmapped Within --outSAMtype BAM Unsorted --outStd BAM_Unsorted --sjdbGTFfile /.../snakePipes_configs/organisms/GRCm38_gencode_release19/annotation/genes.gtf --genomeDir /.../snakePipes_configs/organisms/GRCm38_gencode_release19/STARIndex/SAindex --readFilesIn <(gunzip -c FASTQ/SRR2096209_R1.fastq.gz) <(gunzip -c FASTQ/SRR2096209_R2.fastq.gz) --outFileNamePrefix STAR/SRR2096209/SRR2096209. | samtools sort -m 2G -T $MYTEMP/SRR2096209 -@ 5 -O bam -o STAR/SRR2096209.sorted.bam - 2> STAR/logs/SRR2096209.sort.log rm -rf $MYTEMP

I believe the input files are detected correctly as they are also listed as input in the upstream rules.

However there seem to be a couple of errors popping up, e.g.:

ESC[33mlocalrules directive specifies rules that are not present in the Snakefile: sleuth_Salmon Salmon_wasabi ESC[0m ESC[33mJob counts: count jobs 1 FASTQ2 1ESC[0m

Or CLIs that I'm not sure are executing correctly, e.g.:

command line - /home/barzaghi/miniconda3/envs/snakePipes/lib/python3.8/site-packages/pulp/apis/../solverdir/cbc/linux/64/cbc 50970fce538948959dc5f5631e48e0fe-pulp.mps max ratio None allow None threads None presolve on strong None gomory on knapsack on probing on branch printingOptions all solution 50970fce538948959dc5f5631e48e0fe-pulp.sol (default strategy 1) At line 2 NAME MODEL At line 3 ROWS At line 7 COLUMNS At line 13 RHS At line 16 BOUNDS At line 18 ENDATA Problem MODEL has 2 rows, 1 columns and 2 elements

Perhaps I could send you the .log file for the whole pipeline to give you a global picture.

Just to be sure, this is how I'm launching the pipeline (from the file command.sh that comes with the test data)

mRNA-seq -c /.../miniconda3/envs/snakePipes/lib/python3.8/site-packages/snakePipes/workflows/mRNA-seq/defaults_CAST.yaml --sampleSheet sampleSheet.csv -m alignment --DAG -i . -o ./mRNAseq_test_out -j 6 mm10

If it helps I can also provide the .yaml files.

Many thanks for your support

katsikora commented 4 years ago

Hi again,

this localrules directive specifies rules that are not present in the Snakefile: sleuth_Salmon Salmon_wasabi is just a warning, you can ignore it.

That verbose command line message you're getting is coming from snakemake solver trying to optimize scheduling of jobs on your cluster. Perhaps there is an issue with your cluster config? Are you using slurm, which is our default, or did you update the cluster config (snakemake_cluster_cmd) to your local architecture?

Your mRNA-seq command looks fine.

Best,

Katarzyna

GuidoBarzaghi commented 4 years ago

Ah I see. I am using SLURM as well with the following command:

snakemake_cluster_cmd: sbatch --ntasks-per-node=1 -c 3 -J {rule}.snakemake --mem-per-cpu={cluster.memory} -p htc -o {snakePipes_cluster_logDir}/{rule}.%j.out -e {snakePipes_cluster_logDir}/{rule}.%j.err

I didn't change anything else in cluster.yaml

katsikora commented 4 years ago

Ok, I see you are forcing 3 threads on all the rules instead of using the rule-specific {threads} directive.

Is passing -c {threads} to snakemake_cluster_cmd an option for you? If you would like to limit core usage by snakemake jobs, you could do it on the top level command e.g. by passing --snakemakeOptions ' --cores 3 ' to mRNA-seq. You can also set rule-specific core limits in snakemake, see https://snakemake.readthedocs.io/en/stable/executing/cli.html .

GuidoBarzaghi commented 4 years ago

Thanks for the suggestions :)

Unfortunately setting -c {threads} resulted in the same error

LeilyR commented 4 years ago

you sure that your index files are correct and your fastq files are not empty or broken? Have you tried to run the shell script which fails separately? I wonder if there is something wrong with one of the inputs of that rule.

GuidoBarzaghi commented 4 years ago

Index and fastq files seem healthy. I'm happy to try running the rule that fails (RNA_mapping.snakefile) in isolation. What should be the best way to do that?

katsikora commented 4 years ago

cd $your_output_folder
source activate /home/barzaghi/miniconda3/envs/3adcc849
TMPDIR=/tmp/barzaghi
MYTEMP=$(mktemp -d ${TMPDIR:-/tmp}/snakepipes.XXXXXXXXXX)
( [ -d STAR/SRR2096208 ] || mkdir -p STAR/SRR2096208 )
STAR --runThreadN 20 --sjdbOverhang 100 --outSAMunmapped Within --outSAMtype BAM Unsorted --outStd BAM_Unsorted --sjdbGTFfile /.../snakePipes_configs/organisms/GRCm38_gencode_release19/annotation/genes.gtf --genomeDir /.../snakePipes_configs/organisms/GRCm38_gencode_release19/STARIndex/SAindex --readFilesIn <(gunzip -c FASTQ/SRR2096209_R1.fastq.gz) <(gunzip -c FASTQ/SRR2096209_R2.fastq.gz) --outFileNamePrefix STAR/SRR2096209/SRR2096209. | samtools sort -m 2G -T $MYTEMP/SRR2096209 -@ 5 -O bam -o STAR/SRR2096209.sorted.bam

This will run the rule locally on your machine - you may have to adapt the thread number under --runThreadN.

katsikora commented 4 years ago

You can also run the whole workflow locally by passing --local to mRNA-seq.

GuidoBarzaghi commented 4 years ago

Running STAR locally was successful, is it than a problem of the cluster.yaml?

katsikora commented 4 years ago

It still looks like a problem with the cluster (configuration).

When you supplied -c {threads} to the snakemake_cluster_cmd in your cluster.yaml, did you run snakePipes config with --clusterConfig your_updated_cluster_config.yaml or provide it directly to mRNA-seq via --clusterConfigFile our_updated_cluster_config.yaml ?

Does your $TMPDIR exist on the cluster nodes and do you have enough space there?

GuidoBarzaghi commented 4 years ago

I ran the pipeline with the updated cluster.yaml. Also the $TMPDIR exists and is spacious enough for the test data

LeilyR commented 4 years ago

then try increasing the star memory in cluster.yaml

GuidoBarzaghi commented 4 years ago

the __default__ one?

LeilyR commented 4 years ago

what i meant was that use your own cluster.yaml, I believe you can call it by --clusterConfigFile . you can make a copy of the one your have under your output folder , rename it and update the value for STAR rule

GuidoBarzaghi commented 4 years ago

Oh I see, unfortunately increasing the STAR memory to 30G did not solve the issue. This is how the updated cluster_config.yaml looks like

DESeq2: memory: 5G DESeq2_Salmon: memory: 3G FASTQdownsample: memory: 3G HISAT2: memory: 6G STAR: memory: 30G STAR_allele: memory: 30G SalmonIndex: memory: 2G SalmonQuant: memory: 2G Salmon_TPM: memory: 5G Salmon_counts: memory: 5G __default__: memory: 10G annotation_bed2fasta: memory: 4G bamCoverage: memory: 5G bamCoverage_RPKM: memory: 5G bamCoverage_coverage: memory: 5G bamCoverage_raw: memory: 5G bamCoverage_unique_mappings: memory: 5G bamPE_fragment_size: memory: 10G create_annotation_bed: memory: 4G create_snpgenome: memory: 30G filter_reads_umi: memory: 10G plotCorrelation_pearson: memory: 3G plotCorrelation_pearson_allelic: memory: 5G plotCorrelation_spearman: memory: 3G plotCorrelation_spearman_allelic: memory: 2G plotCoverage: memory: 1G plotEnrichment: memory: 1G plotFingerprint: memory: 1G plotPCA: memory: 4G plotPCA_allelic: memory: 4G plot_heatmap_CSAW_up: memory: 10G sleuth_Salmon: memory: 4G snakePipes_cluster_logDir: /g/krebs/barzaghi/bash_scripts/slurm_err_out snakemake_cluster_cmd: sbatch --ntasks-per-node=1 -c {threads} -J {rule}.snakemake --mem-per-cpu={cluster.memory} -p htc -o /g/krebs/barzaghi/bash_scripts/slurm_err_out/{rule}.%j.out -e /g/krebs/barzaghi/bash_scripts/slurm_err_out/{rule}.%j.err snakemake_latency_wait: 300 snp_split: memory: 10G star_index: memory: 15G

LeilyR commented 4 years ago

The format of your yaml file looks weird maybe it is github which changed the format, but if it worked with this one it did. apart from that you said that you used --local and the workflow worked for you all the way to the end right? So definitely a cluster issue, right? And the error is still the same ? you have nothing under cluster_log folder, have you?

GuidoBarzaghi commented 4 years ago

Exactly right: --local allows a successful completion of the whole pipeline. I have logs under my snakePipes_cluster_logDir, but they just contain chunks of the whole workflow log. The ones under outDir/STAR/logs are empty though. The error message is the same:

Error executing rule STAR on cluster (jobid: 50, external: Submitted batch job 2693966, jobscript: /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/.snakemake/tmp.lpotnxy5/snakejob.STAR.50.sh). For error details see the cluster log and the log files of the involved rule(s). Job failed, going on with independent jobs. [Wed Sep 16 14:45:46 2020] Finished job 37. 26 of 80 steps (32%) done Exiting because a job execution failed. Look above for error message Complete log: /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/.snakemake/log/2020-09-16T143725.087438.snakemake.log

katsikora commented 4 years ago

what a misterious issue ;)

Did the rules "upstream" of STAR mapping work, i.e. did you get symlinks of your original files to a "FASTQ" or "originalFASTQ" subfolder in your output folder? These are also sent to the cluster in the current version of snakePipes. Also, did you get filtered gtf and other files in the "Annotation" subfolder?

GuidoBarzaghi commented 4 years ago

This is the structure of my output folder:

-rw-r--r-- 1 24227 718 123K Sep 16 14:45 mRNAseq_test_out/mRNA-seq_run-1.log -rw-r--r-- 1 24227 718 610 Sep 16 14:37 mRNAseq_test_out/mRNA-seq_tools.txt -rw-r--r-- 1 24227 718 2.3K Sep 16 14:37 mRNAseq_test_out/mRNA-seq_organism.yaml -rw-r--r-- 1 24227 718 1.4K Sep 16 14:37 mRNAseq_test_out/mRNA-seq.cluster_config.yaml -rw-r--r-- 1 24227 718 1.7K Sep 16 14:37 mRNAseq_test_out/mRNA-seq.config.yaml

mRNAseq_test_out/Annotation: total 26M -rw-r--r-- 1 24227 718 17M Sep 16 14:45 genes.filtered.bed -rw-r--r-- 1 24227 718 1.6M Sep 16 14:45 genes.filtered.symbol -rw-r--r-- 1 24227 718 6.6M Sep 16 14:45 genes.filtered.t2g lrwxrwxrwx 1 24227 718 113 Sep 16 14:40 genes.filtered.gtf -> /g/krebs/barzaghi/bash_scripts/utilies/snakePipes_configs/organisms/GRCm38_gencode_release19/annotation/genes.gtf

mRNAseq_test_out/FASTQ: total 48K lrwxrwxrwx 1 24227 718 92 Sep 16 14:41 SRR2096206_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096206_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:41 SRR2096209_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096209_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096207_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096207_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096208_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096208_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096211_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096211_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096207_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096207_R1.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096210_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096210_R2.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096206_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096206_R1.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096211_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096211_R1.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096209_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096209_R1.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096208_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096208_R1.fastq.gz lrwxrwxrwx 1 24227 718 92 Sep 16 14:40 SRR2096210_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out/originalFASTQ/SRR2096210_R1.fastq.gz

mRNAseq_test_out/originalFASTQ: total 0 lrwxrwxrwx 1 24227 718 61 Sep 16 14:41 SRR2096206_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096206_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:41 SRR2096209_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096209_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096211_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096211_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096208_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096208_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096207_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096207_R1.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096207_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096207_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096210_R2.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096210_R2.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096208_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096208_R1.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096211_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096211_R1.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096210_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096210_R1.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096209_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096209_R1.fastq.gz lrwxrwxrwx 1 24227 718 61 Sep 16 14:40 SRR2096206_R1.fastq.gz -> /g/krebs/barzaghi/bash_scripts/utilies/SRR2096206_R1.fastq.gz

mRNAseq_test_out/STAR: total 4.0K drwxr-sr-x 2 24227 718 4.0K Sep 16 14:41 logs

I agree, this is quite puzzling

katsikora commented 4 years ago

I'm not sure why mRNAseq_test_out/originalFASTQ has zero size? Can you zcat mRNAseq_test_out/originalFASTQ/SRR2096206_R2.fastq.gz | head ?

How about the conda env STAR and samtools are installed into? Is this path available on your cluster? Can you source activate /home/barzaghi/miniconda3/envs/3adcc849 in some shell script you send to your cluster via sbatch? And perhaps get output of STAR -h and samtools --help ?

You can also send the full STAR/samtools command to your cluster , and see what you get in the logs.

GuidoBarzaghi commented 4 years ago

That size is peculiar I agree, but the fastq file seems populated

@SRR2096206.1 39V34V1:192:H2TMTBCXX:1:1101:1750:2088 length=50 ATTCGAGCAGAATTAGGTCAACCAGGTGCACTTTTAGGAGATGACCAAAT +SRR2096206.1 39V34V1:192:H2TMTBCXX:1:1101:1750:2088 length=50 GGGGGIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIGIIGIGGIIIIIIIG @SRR2096206.2 39V34V1:192:H2TMTBCXX:1:1101:1617:2153 length=50 CCCATGCAGCCAACACAGTCGTTTACAGCTCTAACAAAATAGACGATACT +SRR2096206.2 39V34V1:192:H2TMTBCXX:1:1101:1617:2153 length=50 GGGGGIIIIIIIIIIIIGIIIIIIGGGIIIGGGGGGIIIIIIGIIIIIII @SRR2096206.3 39V34V1:192:H2TMTBCXX:1:1101:2041:2229 length=50 GCATGACTAAACTACAGCTCATATTCCACAAATTTGAGATTTGTCTTGCC

From STAR -h and samtools --help I'm getting the expected usage messages for STAR version 2.6.0c and samtools version 1.9

Launching the rule on the cluster however returns the following

/var/spool/slurm/job2837334/slurm_script: line 12: activate: No such file or directory mktemp: failed to create directory via template ‘/tmp/barzaghi/snakepipes.XXXXXXXXXX’: No such file or directory gzip: FASTQ/SRR2096206_R2.fastq.gz: No such file or directory gzip: FASTQ/SRR2096206_R1.fastq.gz: No such file or directory /var/spool/slurm/job2837334/slurm_script: line 17: STAR/logs/SRR2096206.sort.log: No such file or directory

LeilyR commented 4 years ago

activate: No such file or directory suggest that you are missing the anaconda directory on the node you submit your job. This you most probably need to sort out with your IT people.

katsikora commented 4 years ago

Like @LeilyR said, looks like the path to your conda installation is not available/mounted on your cluster nodes. Perhaps your IT can help you with that, or you can install conda for your user into a path that is available there.

The first rules of the workflow that deal with fastq files and annotation do not call for rule-specific conda environments, but use bash and python. The mapping rule that calls STAR requires activating a dedicated conda env.

GuidoBarzaghi commented 4 years ago

I see.

Actually I managed to successfully launch STAR on the cluster by source activating /.../miniconda3/envs/snakePipes rather than /.../miniconda3/envs/3adcc849.

Now, evidently there is something I do not understand here: how come the pipeline uses what seems to me as temporary envs (e.g. 3adcc849) rather than the generic snakePipe env? Is this behaviour expected?

katsikora commented 4 years ago

Hi Guido,

The generic snakePipes env has only python, snakemake, and python modules required by the snakePipes python wrappers in it.

Any software packages for NGS data processing, such as STAR etc., are installed into a limited number of conda environments, that you either create with snakePipes createEnvs or they are automatically created the first time you run a workflow. These rule/workflow-specific conda environments are automatically named by snakemake using a hash of the environment yaml file, hence the not so user-friendly names. You can get paths to these envs and what they correspond to by running snakePipes envInfo.

In any case, you need to source activate the path to your generic snakePipes env before running any full workflow, to make the paths to python wrappers, snakemake etc. available. The rule-specific envs are automatically source activated by snakemake when executing a rule that requires them. I thought the first time you ran your full workflow on the cluster, you source activated the generic env beforehands, is that correct?

To test the code for one specific rule that requires a specific conda env, you don't need to source activate the generic snakePipes env, but rather the rule-specific environment, as you don't need snakemake or python wrappers to execute one isolated shell command.

Hope this helps,

Best,

Katarzyna

GuidoBarzaghi commented 4 years ago

Thanks a lot for the explanation, it helps a lot :)

Actually that might be the problem as the mRNAseq specific env (3adcc849) is not listed by snakePipes envInfo

envs/shared.yaml is in: /home/barzaghi/miniconda3/envs/42723b8521a92932b1b8b0035b45c8c2

envs/createIndices.yaml is in: /home/barzaghi/miniconda3/envs/881bf80b50ffdee8cd04c2b4c79f4be2

envs/rna_seq.yaml is in: /home/barzaghi/miniconda3/envs/2e4ce781da5529cfc5da85847031ab60

envs/sc_rna_seq.yaml is in: /home/barzaghi/miniconda3/envs/c05670c9a3b222949368cc85ea77aa12

envs/sc_rna_seq_seurat3.yaml is in: /home/barzaghi/miniconda3/envs/4d22286fbdfcd6dcae66f8af5f55dd7c

envs/sc_rna_seq_loompy.yaml is in: /home/barzaghi/miniconda3/envs/8117fc9e197a9ded763e01ce6b2ee47e

envs/dna_mapping.yaml is in: /home/barzaghi/miniconda3/envs/0e10073fc47605bf3465f4d53bf0c81b

envs/chip_seq.yaml is in: /home/barzaghi/miniconda3/envs/448dc2f247fbc9275f3a9700ac7ff1a0

envs/histone_hmm.yaml is in: /home/barzaghi/miniconda3/envs/dd7efb1338a71eb4543b0411e2ed5f2d

envs/atac_seq.yaml is in: /home/barzaghi/miniconda3/envs/3b3bc60670b29e65e68025a552b92002

envs/hic.yaml is in: /home/barzaghi/miniconda3/envs/7a30c13519bd32ad9a202991d4b6c84b

envs/wgbs.yaml is in: /home/barzaghi/miniconda3/envs/321b94c5a4aa2974fd8511a1ea4a0c47

envs/rmarkdown.yaml is in: /home/barzaghi/miniconda3/envs/24a49c7d518641700962e3eb041c3ffe

envs/preprocessing.yaml is in: /home/barzaghi/miniconda3/envs/716ca2a4d885c02ad64286112987e524

envs/noncoding.yaml is in: /home/barzaghi/miniconda3/envs/2e5ae8bdd9833fe90c63eec535502a60

envs/sambamba.yaml is in: /home/barzaghi/miniconda3/envs/8792566abd75d113cc3e56ce290737cc

but is present in /home/barzaghi/miniconda3/envs/

drwxr-xr-x 21 24227 718 4.0K Sep 8 18:31 4c0f0ebc -rw-rw-r-- 1 24227 718 298 Sep 8 18:27 4c0f0ebc.yaml drwxr-xr-x 17 24227 718 4.0K Sep 8 18:27 3adcc849 -rw-rw-r-- 1 24227 718 494 Sep 8 18:23 3adcc849.yaml drwxr-xr-x 4 24227 718 4.0K Sep 8 18:23 66a495d8 -rw-rw-r-- 1 24227 718 102 Sep 8 18:22 66a495d8.yaml drwxr-xr-x 4 24227 718 4.0K Aug 26 17:37 8792566abd75d113cc3e56ce290737cc drwxr-xr-x 16 24227 718 4.0K Aug 26 17:37 2e5ae8bdd9833fe90c63eec535502a60 drwxr-xr-x 18 24227 718 4.0K Aug 26 17:33 716ca2a4d885c02ad64286112987e524 drwxr-xr-x 16 24227 718 4.0K Aug 26 17:32 24a49c7d518641700962e3eb041c3ffe drwxr-xr-x 16 24227 718 4.0K Aug 26 17:29 321b94c5a4aa2974fd8511a1ea4a0c47 drwxr-xr-x 23 24227 718 4.0K Aug 26 17:25 7a30c13519bd32ad9a202991d4b6c84b drwxr-xr-x 19 24227 718 4.0K Aug 26 17:20 3b3bc60670b29e65e68025a552b92002 drwxr-xr-x 16 24227 718 4.0K Aug 26 17:17 dd7efb1338a71eb4543b0411e2ed5f2d drwxr-xr-x 16 24227 718 4.0K Aug 26 17:15 448dc2f247fbc9275f3a9700ac7ff1a0 drwxr-xr-x 20 24227 718 4.0K Aug 26 17:13 0e10073fc47605bf3465f4d53bf0c81b drwxr-xr-x 11 24227 718 4.0K Aug 26 17:08 8117fc9e197a9ded763e01ce6b2ee47e drwxr-xr-x 16 24227 718 4.0K Aug 26 17:07 4d22286fbdfcd6dcae66f8af5f55dd7c drwxr-xr-x 16 24227 718 4.0K Aug 26 17:03 c05670c9a3b222949368cc85ea77aa12 drwxr-xr-x 17 24227 718 4.0K Aug 26 16:55 2e4ce781da5529cfc5da85847031ab60 drwxr-xr-x 9 24227 718 4.0K Aug 26 16:49 881bf80b50ffdee8cd04c2b4c79f4be2 drwxr-xr-x 21 24227 718 4.0K Aug 26 16:48 42723b8521a92932b1b8b0035b45c8c2 drwxr-xr-x 16 24227 718 4.0K Aug 26 15:57 snakePipes drwxr-xr-x 11 24227 718 4.0K Jul 29 14:50 py2

I guess I must have done something wrong the first time I ran the pipeline. Would deleting these environments are launching the pipeline solve the issue?

katsikora commented 4 years ago

Very strange, there is a difference in hash between the environment name listed by snakePipes envInfo (2e4ce781da5529cfc5da85847031ab60) and the one that snakemake is activating for your mRNA-seq workflow (3adcc849). Different hash suggest a change in the environment yaml file content after installation and snakePipes config, or a change in its path. Is that possible ?

Can it be that there is some mixup in snakePipes versions or installation paths on your system?

The following should return consistent results: source activate your_generic_snakePipes_env snakePipes config --snakemakeOptions ' --use-conda --conda-prefix /home/barzaghi/miniconda3/envs ' snakePipes createEnvs snakePipes envInfo`

On our side, this is working consistently.

This doesn't necessarily explain the issue with STAR on the cluster, as long as 3adcc849 is build correctly. You can try to clean up both the snakePipes installation and the environments, perhaps it this helps.

GuidoBarzaghi commented 4 years ago

Sorry for the wait.

So I saw there was an updated version of snakePipes so I basically reinstalled the whole thing. Now the environments I find under miniconda correspond to the ones returned by snakePipes envInfo. Unfortunately I still got the same cryptic error while launching the pipeline on the test data. Upon instead launching the star rule only as follows

source /g/krebs/barzaghi/utilies/miniconda3/etc/profile.d/conda.sh conda activate /g/krebs/barzaghi/utilies/miniconda3/envs/203a38e6911ca794da2f33c64c7233e5

cd /g/krebs/barzaghi/bash_scripts/utilies/mRNAseq_test_out

TMPDIR=/tmp/barzaghi MYTEMP=$(mktemp -d ${TMPDIR:-/tmp}/snakepipes.XXXXXXXXXX) ( [ -d STAR/SRR2096211 ] || mkdir -p STAR/SRR2096211 )

STAR --runThreadN 20 --sjdbOverhang 100 --outSAMunmapped Within --outSAMtype BAM Unsorted --outStd BAM_Unsorted --sjdbGTFfile /g/krebs/barzaghi/DB/genomes/snakePipes_indexes/GRCm38_gencode_release19/annotation/genes.gtf --genomeDir /g/krebs/barzaghi/DB/genomes/snakePipes_indexes/GRCm38_gencode_release19/STARIndex/ --readFilesIn <(gunzip -c FASTQ/SRR2096211_R1.fastq.gz) <(gunzip -c FASTQ/SRR2096211_R2.fastq.gz) --outFileNamePrefix STAR/SRR2096211/SRR2096211. | samtools sort -m 2G -T $MYTEMP/SRR2096211 -@ 5 -O bam -o STAR/SRR2096211.sorted.bam - 2> STAR/logs/SRR2096211.sort.log

rm -rf $MYTEMP

I got the following error:

EXITING because of FATAL ERROR: Genome version: 20201 is INCOMPATIBLE with running STAR version: 2.7.4a SOLUTION: please re-generate genome from scratch with running version of STAR, or with version: 2.7.4a

Sep 24 11:30:46 ...... FATAL ERROR, exiting

Could it be something like the issue reported here

katsikora commented 4 years ago

Hi again,

so yes, indeed, STAR devs have changed the genome index format a couple of times already, such that a given format is only working with a couple of minor/bugfix versions. 2.7.4a requires a new genome build compared to the previous version.

From release 2.2.0, snakePipes uses STAR version 2.7.4.a , as it brings in new functionality for scRNAseq analysis. That means rerunning createIndices with snakePipes version 2.2.0 or later (https://github.com/maxplanck-ie/snakepipes/releases/tag/2.2.0).

GuidoBarzaghi commented 4 years ago

I see, I will then run createIndices :)

GuidoBarzaghi commented 4 years ago

Unfortunately, recreating the indices runs into the same kind of problems as the RNAseq pipeline. I would like to try running either pipelines after loading the anaconda module on the cluster nodes (it has to be loaded separately for each rule I assume). Could you guide me towards the right place to add the module load anaconda line in the code?

LeilyR commented 4 years ago

https://github.com/maxplanck-ie/snakepipes/blob/master/snakePipes/shared/rules/RNA_mapping.snakefile#L92 under the shell if you want to add it to the star rule.

katsikora commented 4 years ago

Perhaps module loading anaconda with each rule is a bit of a last resort?

Did you try something more generic as e.g. module load anaconda source activate snakePipes mRNA-seq -i input -o output

or, say modifying your snakemake cmd in cluster.yaml to look like: module load anaconda; sbatch ...

GuidoBarzaghi commented 4 years ago

I am very happy to report that adding module load anaconda to the cluster.yaml worked :) I managed to run successfully both the createIndices and RNAseq pipelines.

I can never thank you enough for your exceptional support and great work on developing snakePipes.

katsikora commented 4 years ago

Great, I'm glad to hear that! Thanks for your patience with the issue :)

maxplanck-ie / snakepipes

Error in rule STAR with empty .log #691