Missing input rule error

suvi93 commented 1 year ago

Hi Hanna,

i'm getting this error from my run -

Workflow defines that rule genome_faidx is eligible for caching between workflows (use the --cache argument to enable this). Workflow defines that rule bwa_index is eligible for caching between workflows (use the --cache argument to enable this). Building DAG of jobs... MissingInputException in line 167 of tools/findZX/workflow/rules/no_synteny_plotting.smk: Missing input files for rule table_readme: workflow/report/output_table_README.md

this is my command - snakemake -s workflow/findZX --configfile config.yml -R all --use-conda -k

the files it says are missing in the error file are present in the main directory. any idea where I'm going wrong?

Thanks, Suvi

hsigeman commented 1 year ago

Hi Suvi,

It looks like snakemake can not find one or several of the specified input files. Unfortunately the error messages produced by snakemake are not very informative about which ones..

To solve the problem, I would need some additional information:

Did you run the pipeline with the test dataset first using this command, and did that work? snakemake -s workflow/findZX --configfile .test/config.yml --cores 1 -R all --use-conda -k
Could you make sure that the ref_genome variable in config.yml is correct, and that the paths to the fastq files in the unit file exists?
If you send me your config file and unit file I can also double check that there is nothing strange with the formatting.

Hanna

suvi93 commented 1 year ago

Hi Hanna,

no that command did not run and i got the error i mentioned when i ran it. i even got the same error when i ran it on the test data set given as well.
the files given in the config.yml and units.tsv are the correct paths. i've attached my files below. please do have a look.(uploaded as .txt as github does not support uploading of .yml files)

Suvi config.txt units.txt

hsigeman commented 1 year ago

Hi Suvi,

Ok I see, if the test dataset is not working either it must be some other problem. Just to make sure, are you running this from the "findZX" directory?

Hanna

suvi93 commented 1 year ago

Hi Hanna, no i'm not, i made an output directory and i'm running it from there. is there a way to run from the findZX directory and get the output in the desired directory?

hsigeman commented 1 year ago

Hi Suvi,

The pipeline is hardcoded to create the output directories within the findZX directory, so there is unfortunately no option to do that automatically.

The easiest way would be to just copy the output to another location once it has finished, but you could also edit the paths in the beginning of the snakefiles (workflow/findZX and workflow/findZX-synteny) to direct the output somewhere else.

Hope it works when you run the pipeline from the main directory, and let me know otherwise!

Hanna

suvi93 commented 12 months ago

Hi Hanna,

I finally did manage to get the pipeline running but I have a different error now. My scaffolds are too big (>100Mb) and samtools index cannot index the bam files. We usually use bamtools index instead to bypass this error. I noticed in the environment.yml file that bamtools is being installed, so, is there a way to replace the samtools index with bamtools index in the pipeline?

Here is the error in the log file -

[E::hts_idx_check_range] Region 536870788..536870939 cannot be stored in a bai index. Try using a csi index [E::sam_index] Read 'A00605:49:HLJC3DSXX:1:2657:9001:29324' with ref_name='scaffold_1', ref_length=591107699, flags=163, pos=536870789 cannot be indexed samtools index: failed to create index for "results/ref/vv19_out_JBAT.FINAL.masked/dedup/Female5__homogametic.sorted.dedup.mismatch.0.0.bam": Numerical result out of range

Suvi

hsigeman commented 12 months ago

Hi Suvi,

If you have bamtools installed, you should be able to edit the "samtools_index" rule so that it uses bamtools instead. The rule is in the following file: workflow/rules/mapping.smk.

Replace this text:

rule samtools_index:
    input:
        dedup_dir + "{sample}__{group}.sorted.dedup.mismatch.{ED}.bam", 
    output:
        dedup_dir + "{sample}__{group}.sorted.dedup.mismatch.{ED}.bam.bai", 
    log:
        logs_dir + "samtools/{sample}-{group}.{ED}.log",
    message:
        "Index BAM file: {wildcards.sample}__{wildcards.group}.sorted.dedup.mismatch.{wildcards.ED}.bam"
    wrapper:
        "0.74.0/bio/samtools/index"

with this text:

rule samtools_index:
    input:
        dedup_dir + "{sample}__{group}.sorted.dedup.mismatch.{ED}.bam", 
    output:
        dedup_dir + "{sample}__{group}.sorted.dedup.mismatch.{ED}.bam.bai", 
    log:
        logs_dir + "samtools/{sample}-{group}.{ED}.log",
    message:
        "Index BAM file: {wildcards.sample}__{wildcards.group}.sorted.dedup.mismatch.{wildcards.ED}.bam"
    shell:
        "bamtools index {input}"

I haven't tested this so I can't promise that it will work as intended, but I think it will.

If not, however, there is a solution for this problem in the README of this GitHub page where you can simply split large scaffolds into several smaller scaffolds:

bedtools makewindows -g reference.fasta -w <WINDOW_SIZE> > reference.<WINDOW_SIZE>.bed
bedtools getfasta -fi reference.fasta -bed reference.<WINDOW_SIZE>.bed | sed 's/:/_/g' | sed 's/-/_/g' > reference.split.fasta

Hanna

suvi93 commented 12 months ago

Hi Hanna,

I went ahead with splitting the scaffolds and it seems to be running fine for now.

Also, unrelated, but, I had a couple of questions regarding your phasing pipeline and I had emailed you about it but never heard back from you. Is it okay if I trouble you regarding that again in case you missed the email?

Thanks, Suvi

hsigeman commented 12 months ago

Hi Suvi,

Yes I found your email now, sorry about that! Since those questions are not related to findZX, I'll close this issue and will reply to your email separately.

Hanna

syd-alm commented 11 months ago

Hi Hanna and Suvi,

I am having the same error as Suvi had when using the pipeline on my own data:

MissingInputException in line 167 of `tools/findZX/workflow/rules/no_synteny_plotting.smk:

I also saw that another user also had this same issue (closed issue #1), but both this thread and the other one didn't specify how the error was resolved. Any input or suggestions would be most appreciated!

I have successfully run the test dataset, and have checked that my config file has a path to the reference genome. I've attached my config file for reference.

Thanks in advance! -Sydney

p_herring_config.txt

hsigeman / findZX

Missing input rule error #15