vanheeringen-lab / seq2science

Automated and customizable preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and (sc)RNA-seq workflows. Works equally easy with public as local data.
https://vanheeringen-lab.github.io/seq2science
MIT License
155 stars 27 forks source link

UnboundLocalError: local variable 'kmer_size' referenced before assignment #900

Closed ivanferrreira closed 1 year ago

ivanferrreira commented 1 year ago

Hi,

Thanks for building this great tool! I'm trying to map some published data and I get the following error:

localrules directive specifies rules that are not present in the Snakefile: combine_qc_files

Building DAG of jobs... InputFunctionException in line 158 of /home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/peak_calling.smk:

Error: UnboundLocalError: local variable 'kmer_size' referenced before assignment

Wildcards: assembly=galGal6 sample=GSM3580697

Traceback: File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/peak_calling.smk", line 154, in get_genome_size File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/../rules/configuration_workflows.smk", line 205, in get_read_length

Any guidance?

Maarten-vd-Sande commented 1 year ago

Hey thanks for reporting the issue. That's not supposed to happen!!!

Did you change anything in the config.yaml? To me it looks like the trimmer has not been set properly. Ideally it would give a nice human readable error though. Could you post the content of your config.yaml here? :smile:

Maarten-vd-Sande commented 1 year ago

FWIW, I could run seq2science with sample GSM3580697 with assembly galGal6 with the trimmer set like this:

trimmer: fastp

If you want to change trimoptions you should do it like this:

trimmer:
  fastp:
    trimoptions: --option1 dothis --option2 dothat

This is unfortunately extremely poorly designed/documented on our side :smile:

ivanferrreira commented 1 year ago

You are right - I changed the config.yaml to trimgalore but was using the wrong notation (trim-galore), fixing it worked out :D

ivanferrreira commented 1 year ago

Actually it gave another error now :/ I think it's a problem with the installation of deeptools, what do you think?

Activating conda environment: ../../../../../../../home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd
[Wed Oct  5 23:18:03 2022]
Error in rule plotHeatmap_peak:
    jobid: 39
    output: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/qc/plotHeatmap_peaks/N20000-mm10-deepTools_macs2_heatmap_mqc.png
    log: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/log/plotHeatmap_peaks/mm10-macs2_N20000.log (check log file(s) for error message)
    conda-env: /home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd
    shell:

        plotHeatmap --matrixFile /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/qc/computeMatrix_peak/mm10-macs2_N20000.mat.gz --outFileName /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/qc/plotHeatmap_peaks/N20000-mm10-deepTools_macs2_heatmap_mqc.png --kmeans 6 --xAxisLabel "Summit distance (bp)" --startLabel="-1000" --endLabel=1000 > /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/log/plotHeatmap_peaks/mm10-macs2_N20000.log 2>&1

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Wed Oct  5 23:18:18 2022]
Finished job 30.
50 of 66 steps (76%) done
[Wed Oct  5 23:18:22 2022]
Finished job 50.
51 of 66 steps (77%) done
Removing temporary output /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/macs2/mm10_combinedpeaks.bed.
Removing temporary output /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/final_bam/mm10-GSM4931882.samtools-coordinate.bam.bai.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

      __    __    __  ____  ____  _   
     /  \  /  \  /  \(  _ \/ ___)/ \  
    (  O )(  O )(  O )) __/\___ \\_/  
     \__/  \__/  \__/(__)  (____/(_)  

    One or more rules did not finish as expected!

    Please take a look at the log files of the failed rule(s), and our Frequently Asked Questions: 
    https://vanheeringen-lab.github.io/seq2science/content/faq.html

    If that does not help you, don't be afraid to reach out to us. 
    The easiest way would be to make an issue on our github page: 
    https://github.com/vanheeringen-lab/seq2science/issues

Complete log: seq2science.2022-10-05T204449.893990.log

seq2science.2022-10-05T223814.968861.log

Maarten-vd-Sande commented 1 year ago

(almost) Every piece of code gets executed in so-callled rules, and these rules log their output to a logfile. In this case it's file: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/log/plotHeatmap_peaks/mm10-macs2_N20000.log

Can you post the output of that?

Maarten-vd-Sande commented 1 year ago

Could indeed be related to a corrupt deeptools installation, see #898

You could try changing bioconda::deeptools=3.5.0 into bioconda::deeptools=3.5.1

in the environment of deeptools. You can change that in this file: /home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/envs/deeptools.yaml

This will be fixed in the next release :smile:

ivanferrreira commented 1 year ago

Thanks so much I will try updating the deeptools.yaml!

And </n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/chicken_test/results/log/plotHeatmap_peaks/mm10-macs2_N20000.log> is this:

Warning For clustering nan values have to be replaced by zeros Traceback (most recent call last): File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd/bin/plotHeatmap", line 12, in main(args) File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd/lib/python3.10/site-packages/deeptools/plotHeatmap.py", line 874, in main plotMatrix(hm, File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd/lib/python3.10/site-packages/deeptools/plotHeatmap.py", line 770, in plotMatrix plt.savefig(outFileName, bbox_inches='tight', pdd_inches=0, dpi=dpi, format=image_format) File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd/lib/python3.10/site-packages/matplotlib/pyplot.py", line 942, in savefig res = fig.savefig(args, kwargs) File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd/lib/python3.10/site-packages/matplotlib/figure.py", line 3272, in savefig self.canvas.print_figure(fname, kwargs) File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd/lib/python3.10/site-packages/matplotlib/backend_bases.py", line 2338, in print_figure result = print_method( File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd/lib/python3.10/site-packages/matplotlib/backend_bases.py", line 2204, in print_method = functools.wraps(meth)(lambda args, *kwargs: meth( File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd/lib/python3.10/site-packages/matplotlib/_api/deprecation.py", line 385, in wrapper arguments = signature.bind(inner_args, **inner_kwargs).arguments File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd/lib/python3.10/inspect.py", line 3179, in bind return self._bind(args, kwargs) File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ff3912e4b1a18086e20e2873fa6d2ebd/lib/python3.10/inspect.py", line 3168, in _bind raise TypeError( TypeError: got an unexpected keyword argument 'pdd_inches'

Maarten-vd-Sande commented 1 year ago

The error you got should be solved by updating the deeptools yaml! :partying_face: Let me know if that's actually the case!

Sorry you got this issue. I'll make a new release after you confirm (0.9.6) so that others won't have this issue. Thanks for reporting your issue :smile:

ivanferrreira commented 1 year ago

Great, thanks so much!

Can you also help me with adjusting the config.yaml to use a specific genome.fasta and GTF?

Maarten-vd-Sande commented 1 year ago

Sure!

ivanferrreira commented 1 year ago

I have the files petMar3_augustus.gtf and petMar3.fa at /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/genomes/

But simply adding the path to custom_genome_extension: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/genomes/

doesn't work, how should I do it?

Really appreciate it, thank you!

Maarten-vd-Sande commented 1 year ago

You should use genome_dir instead of custom_genome_extension!

Seq2science should only download a genome if it isn't already present in your genome_dir. I haven't used my own assembly for a while, but I think what you need is to specify the assembly in your samples.tsv. For this example we will use myassembly

I think all you need for the chip/atac workflows is this specific file for seq2science to pick them up:

{genome_dir}/myassembly/myassembly.fa

In the config.yaml that would be

genome_dir: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/genomes/

And then the name of the assembly (myassembly) you want to use added in the samples.tsv

For RNA-seq it is a bit more complicated. It would need these files as well:

{genome_dir}/{assembly}/{assembly}.annotation.gtf
{genome_dir}/{assembly}/{assembly}.annotation.bed
ivanferrreira commented 1 year ago

Got it, thanks so much!

Maarten-vd-Sande commented 1 year ago

Everything worked? Shall we close the issue?

ivanferrreira commented 1 year ago

Yeap, thank you!

ivanferrreira commented 1 year ago

Was very close to getting to the end line but got another issue, do you have any idea how to fix this one?

rule upset_plot_peaks: input: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/danRer11/results/counts/macs2/danRer11_onehotpeaks.tsv output: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/danRer11/results/qc/upset/danRer11-macs2_upset_mqc.jpg jobid: 332 wildcards: assembly=danRer11, peak_caller=macs2 resources: tmpdir=/tmp

Activating conda environment: ../../../../../../../home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/a2ca95571a314d6fdcb38c97f2119cce Activating conda environment: ../../../../../../../home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/a2ca95571a314d6fdcb38c97f2119cce Traceback (most recent call last): File "/n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/danRer11/.snakemake/scripts/tmp6on4m1zo.upset.py", line 8, in from upsetplot import from_contents, from_memberships, from_indicators, plot File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/a2ca95571a314d6fdcb38c97f2119cce/lib/python3.10/site-packages/upsetplot/init.py", line 6, in from .plotting import UpSet, plot File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/a2ca95571a314d6fdcb38c97f2119cce/lib/python3.10/site-packages/upsetplot/plotting.py", line 14, in from matplotlib.tight_layout import get_renderer ImportError: cannot import name 'get_renderer' from 'matplotlib.tight_layout' (/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/a2ca95571a314d6fdcb38c97f2119cce/lib/python3.10/site-packages/matplotlib/tight_layout.py) [Thu Oct 13 08:11:01 2022] Error in rule upset_plot_peaks: jobid: 332 output: /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/danRer11/results/qc/upset/danRer11-macs2_upset_mqc.jpg conda-env: /home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/a2ca95571a314d6fdcb38c97f2119cce

RuleException: CalledProcessError in line 126 of /home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/qc_peaks.smk: Command 'source /home/ic2690/miniconda3/envs/seq2science/bin/activate '/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/a2ca95571a314d6fdcb38c97f2119cce'; set -euo pipefail; python /n/sci/SCI-004277-IC2690/akt326/Kallisto/bulkATAC-seq/danRer11/.snakemake/scripts/tmp6on4m1zo.upset.py' returned non-zero exit status 1. File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/rules/qc_peaks.smk", line 126, in __rule_upset_plot_peaks File "/home/ic2690/miniconda3/envs/seq2science/lib/python3.8/concurrent/futures/thread.py", line 57, in run [Thu Oct 13 08:13:54 2022] Finished job 316. 478 of 503 steps (95%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message

Maarten-vd-Sande commented 1 year ago

Unlucky! Seems to be a similar issue as deeptools, in that the environment broke because of matplotlib changes (https://github.com/jnothman/UpSetPlot/issues/191). Could you edit the upset yaml file and change the matplotlib version in there?

in /home/ic2690/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/envs/upset.yaml add conda-forge::matplotlib=3.5.3

ivanferrreira commented 1 year ago

working great, thanks :D

hopefully a last question here - I was running merge with fisher, but if I want to only rerun the merging part now with IDR and setting a q-value of 1 for macs2, can I only update the config file to do this but run this in the same folder to avoid having to rerun the entire pipeline?

I found a bit confusing to figure out how to only run a few rules instead of the whole thing

Maarten-vd-Sande commented 1 year ago

Yes. Seq2science should only rerun what is needed. So you can just change stuff in the config, and then start a run in the same folder and it should only rerun the necessary rules