Closed suvi93 closed 12 months ago
Hi Suvi,
It looks like snakemake can not find one or several of the specified input files. Unfortunately the error messages produced by snakemake are not very informative about which ones..
To solve the problem, I would need some additional information:
Did you run the pipeline with the test dataset first using this command, and did that work?
snakemake -s workflow/findZX --configfile .test/config.yml --cores 1 -R all --use-conda -k
Could you make sure that the ref_genome
variable in config.yml
is correct, and that the paths to the fastq files in the unit file exists?
If you send me your config file and unit file I can also double check that there is nothing strange with the formatting.
Hanna
Hi Hanna,
Suvi config.txt units.txt
Hi Suvi,
Ok I see, if the test dataset is not working either it must be some other problem. Just to make sure, are you running this from the "findZX" directory?
Hanna
Hi Hanna, no i'm not, i made an output directory and i'm running it from there. is there a way to run from the findZX directory and get the output in the desired directory?
Hi Suvi,
The pipeline is hardcoded to create the output directories within the findZX directory, so there is unfortunately no option to do that automatically.
The easiest way would be to just copy the output to another location once it has finished, but you could also edit the paths in the beginning of the snakefiles (workflow/findZX
and workflow/findZX-synteny
) to direct the output somewhere else.
Hope it works when you run the pipeline from the main directory, and let me know otherwise!
Hanna
Hi Hanna,
I finally did manage to get the pipeline running but I have a different error now. My scaffolds are too big (>100Mb) and samtools index cannot index the bam files. We usually use bamtools index instead to bypass this error. I noticed in the environment.yml file that bamtools is being installed, so, is there a way to replace the samtools index with bamtools index in the pipeline?
Here is the error in the log file -
[E::hts_idx_check_range] Region 536870788..536870939 cannot be stored in a bai index. Try using a csi index [E::sam_index] Read 'A00605:49:HLJC3DSXX:1:2657:9001:29324' with ref_name='scaffold_1', ref_length=591107699, flags=163, pos=536870789 cannot be indexed samtools index: failed to create index for "results/ref/vv19_out_JBAT.FINAL.masked/dedup/Female5__homogametic.sorted.dedup.mismatch.0.0.bam": Numerical result out of range
Suvi
Hi Suvi,
If you have bamtools installed, you should be able to edit the "samtools_index" rule so that it uses bamtools instead. The rule is in the following file: workflow/rules/mapping.smk
.
Replace this text:
rule samtools_index:
input:
dedup_dir + "{sample}__{group}.sorted.dedup.mismatch.{ED}.bam",
output:
dedup_dir + "{sample}__{group}.sorted.dedup.mismatch.{ED}.bam.bai",
log:
logs_dir + "samtools/{sample}-{group}.{ED}.log",
message:
"Index BAM file: {wildcards.sample}__{wildcards.group}.sorted.dedup.mismatch.{wildcards.ED}.bam"
wrapper:
"0.74.0/bio/samtools/index"
with this text:
rule samtools_index:
input:
dedup_dir + "{sample}__{group}.sorted.dedup.mismatch.{ED}.bam",
output:
dedup_dir + "{sample}__{group}.sorted.dedup.mismatch.{ED}.bam.bai",
log:
logs_dir + "samtools/{sample}-{group}.{ED}.log",
message:
"Index BAM file: {wildcards.sample}__{wildcards.group}.sorted.dedup.mismatch.{wildcards.ED}.bam"
shell:
"bamtools index {input}"
I haven't tested this so I can't promise that it will work as intended, but I think it will.
If not, however, there is a solution for this problem in the README of this GitHub page where you can simply split large scaffolds into several smaller scaffolds:
bedtools makewindows -g reference.fasta -w <WINDOW_SIZE> > reference.<WINDOW_SIZE>.bed
bedtools getfasta -fi reference.fasta -bed reference.<WINDOW_SIZE>.bed | sed 's/:/_/g' | sed 's/-/_/g' > reference.split.fasta
Hanna
Hi Hanna,
I went ahead with splitting the scaffolds and it seems to be running fine for now.
Also, unrelated, but, I had a couple of questions regarding your phasing pipeline and I had emailed you about it but never heard back from you. Is it okay if I trouble you regarding that again in case you missed the email?
Thanks, Suvi
Hi Suvi,
Yes I found your email now, sorry about that! Since those questions are not related to findZX, I'll close this issue and will reply to your email separately.
Hanna
Hi Hanna and Suvi,
I am having the same error as Suvi had when using the pipeline on my own data:
MissingInputException in line 167 of `tools/findZX/workflow/rules/no_synteny_plotting.smk:
I also saw that another user also had this same issue (closed issue #1), but both this thread and the other one didn't specify how the error was resolved. Any input or suggestions would be most appreciated!
I have successfully run the test dataset, and have checked that my config file has a path to the reference genome. I've attached my config file for reference.
Thanks in advance! -Sydney
Hi Hanna,
i'm getting this error from my run -
this is my command -
snakemake -s workflow/findZX --configfile config.yml -R all --use-conda -k
the files it says are missing in the error file are present in the main directory. any idea where I'm going wrong?
Thanks, Suvi