Closed ivanferrreira closed 1 year ago
Hey thanks for reporting the issue. That's not supposed to happen!!!
Did you change anything in the config.yaml
? To me it looks like the trimmer has not been set properly. Ideally it would give a nice human readable error though. Could you post the content of your config.yaml here? :smile:
FWIW, I could run seq2science with sample GSM3580697
with assembly galGal6
with the trimmer set like this:
trimmer: fastp
If you want to change trimoptions you should do it like this:
trimmer:
fastp:
trimoptions: --option1 dothis --option2 dothat
This is unfortunately extremely poorly designed/documented on our side :smile:
You are right - I changed the config.yaml to trimgalore but was using the wrong notation (trim-galore), fixing it worked out :D
Actually it gave another error now :/ I think it's a problem with the installation of deeptools, what do you think?
Activating conda environment: ../../../../../../../home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd
[Wed Oct 5 23:18:03 2022]
Error in rule plotHeatmap_peak:
jobid: 39
output: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/qc/plotHeatmap_peaks/N20000-mm10-deepTools_macs2_heatmap_mqc.png
log: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/log/plotHeatmap_peaks/mm10-macs2_N20000.log (check log file(s) for error message)
conda-env: /home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd
shell:
plotHeatmap --matrixFile /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/qc/computeMatrix_peak/mm10-macs2_N20000.mat.gz --outFileName /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/qc/plotHeatmap_peaks/N20000-mm10-deepTools_macs2_heatmap_mqc.png --kmeans 6 --xAxisLabel "Summit distance (bp)" --startLabel="-1000" --endLabel=1000 > /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/log/plotHeatmap_peaks/mm10-macs2_N20000.log 2>&1
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Wed Oct 5 23:18:18 2022]
Finished job 30.
50 of 66 steps (76%) done
[Wed Oct 5 23:18:22 2022]
Finished job 50.
51 of 66 steps (77%) done
Removing temporary output /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/macs2/mm10_combinedpeaks.bed.
Removing temporary output /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/final_bam/mm10-GSM4931882.samtools-coordinate.bam.bai.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
__ __ __ ____ ____ _
/ \ / \ / \( _ \/ ___)/ \
( O )( O )( O )) __/\___ \\_/
\__/ \__/ \__/(__) (____/(_)
One or more rules did not finish as expected!
Please take a look at the log files of the failed rule(s), and our Frequently Asked Questions:
https://vanheeringen-lab.github.io/seq2science/content/faq.html
If that does not help you, don't be afraid to reach out to us.
The easiest way would be to make an issue on our github page:
https://github.com/vanheeringen-lab/seq2science/issues
Complete log: seq2science.2022-10-05T204449.893990.log
(almost) Every piece of code gets executed in so-callled rules, and these rules log their output to a logfile. In this case it's file: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/log/plotHeatmap_peaks/mm10-macs2_N20000.log
Can you post the output of that?
Could indeed be related to a corrupt deeptools installation, see #898
You could try changing bioconda::deeptools=3.5.0
into bioconda::deeptools=3.5.1
in the environment of deeptools. You can change that in this file: /home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/envs/deeptools.yaml
This will be fixed in the next release :smile:
Thanks so much I will try updating the deeptools.yaml!
And </n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/log/plotHeatmap_peaks/mm10-macs2_N20000.log> is this:
Warning For clustering nan values have to be replaced by zeros
Traceback (most recent call last):
File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd/bin/plotHeatmap", line 12, in
The error you got should be solved by updating the deeptools yaml! :partying_face: Let me know if that's actually the case!
Sorry you got this issue. I'll make a new release after you confirm (0.9.6) so that others won't have this issue. Thanks for reporting your issue :smile:
Great, thanks so much!
Can you also help me with adjusting the config.yaml to use a specific genome.fasta and GTF?
Sure!
I have the files petMar3_augustus.gtf and petMar3.fa at /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/genomes/
But simply adding the path to custom_genome_extension: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/genomes/
doesn't work, how should I do it?
Really appreciate it, thank you!
You should use genome_dir
instead of custom_genome_extension
!
Seq2science should only download a genome if it isn't already present in your genome_dir
. I haven't used my own assembly for a while, but I think what you need is to specify the assembly in your samples.tsv. For this example we will use myassembly
I think all you need for the chip/atac workflows is this specific file for seq2science to pick them up:
{genome_dir}/myassembly/myassembly.fa
In the config.yaml that would be
genome_dir: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/genomes/
And then the name of the assembly (myassembly
) you want to use added in the samples.tsv
For RNA-seq it is a bit more complicated. It would need these files as well:
{genome_dir}/{assembly}/{assembly}.annotation.gtf
{genome_dir}/{assembly}/{assembly}.annotation.bed
Got it, thanks so much!
Everything worked? Shall we close the issue?
Yeap, thank you!
Was very close to getting to the end line but got another issue, do you have any idea how to fix this one?
rule upset_plot_peaks: input: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/danRer11/results/counts/macs2/danRer11_onehotpeaks.tsv output: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/danRer11/results/qc/upset/danRer11-macs2_upset_mqc.jpg jobid: 332 wildcards: assembly=danRer11, peak_caller=macs2 resources: tmpdir=/tmp
Activating conda environment: ../../../../../../../home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/a2ca95571a314d6fdcb38c97f2119cce
Activating conda environment: ../../../../../../../home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/a2ca95571a314d6fdcb38c97f2119cce
Traceback (most recent call last):
File "/n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/danRer11/.snakemake/scripts/tmp6on4m1zo.upset.py", line 8, in
RuleException: CalledProcessError in line 126 of /home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/qc_peaks.smk: Command 'source /home/ic2690/miniconda3/envs/seq2science/bin/activate '/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/a2ca95571a314d6fdcb38c97f2119cce'; set -euo pipefail; python /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/danRer11/.snakemake/scripts/tmp6on4m1zo.upset.py' returned non-zero exit status 1. File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/qc_peaks.smk", line 126, in __rule_upset_plot_peaks File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/concurrent/futures/thread.py", line 57, in run [Thu Oct 13 08:13:54 2022] Finished job 316. 478 of 503 steps (95%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message
Unlucky! Seems to be a similar issue as deeptools, in that the environment broke because of matplotlib changes (https://github.com/jnothman/UpSetPlot/issues/191). Could you edit the upset yaml file and change the matplotlib version in there?
in /home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/envs/upset.yaml
add conda-forge::matplotlib=3.5.3
working great, thanks :D
hopefully a last question here - I was running merge with fisher, but if I want to only rerun the merging part now with IDR and setting a q-value of 1 for macs2, can I only update the config file to do this but run this in the same folder to avoid having to rerun the entire pipeline?
I found a bit confusing to figure out how to only run a few rules instead of the whole thing
Yes. Seq2science should only rerun what is needed. So you can just change stuff in the config, and then start a run in the same folder and it should only rerun the necessary rules
Hi,
Thanks for building this great tool! I'm trying to map some published data and I get the following error:
localrules directive specifies rules that are not present in the Snakefile: combine_qc_files
Building DAG of jobs... InputFunctionException in line 158 of /home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/peak_calling.smk:
Error: UnboundLocalError: local variable 'kmer_size' referenced before assignment
Wildcards: assembly=galGal6 sample=GSM3580697
Traceback: File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/peak_calling.smk", line 154, in get_genome_size File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/../rules/configuration_workflows.smk", line 205, in get_read_length
Any guidance?