nf-core / hic

Analysis of Chromosome Conformation Capture data (Hi-C)
https://nf-co.re/hic
MIT License
90 stars 54 forks source link

Issues with bowtie2 index generation #66

Closed cmdoret closed 4 years ago

cmdoret commented 4 years ago

Hello,

Thanks for the nice pipeline and great documentation !

I am trying to run nfcore/hic with a custom genome and bowtie2 seems to be using the wrong path when looking for the index. Here is the command line I used:

nextflow run nf-core/hic \
    -r 1.1.0 \
    -profile docker \
    --reads 'my_reads.end{1,2}.fq.gz' \
    --fasta my_genome.fa \
    --restriction_site "^GATC"

The pipeline finishes makeBowtie2Index, but crashes during bowtie2_end_to_end. The error indicates: (ERR): "bowtie2_index/my_genome.fa" does not exist or is not a Bowtie 2 index

When looking at the content of the bowtie2_index directory, I see files names my_genome.1.bt2, ... This suggests the .fa extension is removed when building the index, but not when calling bowtie2.

I also tried specifying a pre-built index:

nextflow run nf-core/hic \
    -r 1.1.0 \
    -profile docker \
    --reads 'my_reads.end{1,2}.fq.gz' \
    --fasta my_genome.fa \
    --bwt2_index my_index \
    --restriction_site "^GATC"

But the pipeline crashes instantly with error: Missing `fromPath` parameter

Did I miss something ? Any help would be greatly appreciated.

nservant commented 4 years ago

Hi, Thanks for reporting this issue. Indeed, there is an issue with the --fasta parameter which need more investigation. It seems to work with some extension (like .fsa for the toy dataset), but not with others (like .fa) !!! I iwll investigate this point, but in the meantime, can you try to rename your fasta file ?

Regarding your other point with --bwt2_index, I just made a test, it should work

nextflow run main.nf --reads './debug/*R{1,2}.fq.gz' --fasta './debug/W303_SGD_2015_JRIU00000000.fsa' --restriction_site '^GATC' --bwt2_index './bowtie2_index/W303_SGD_2015_JRIU00000000'

Did you put the index prefix in the --bowtie2_index ? Thanks

cmdoret commented 4 years ago

Thanks for the help ! Indeed, changing the fasta extension to .fsa solves the problem. I suppose the bug is caused by this line: https://github.com/nf-core/hic/blob/b84069e1f2f1d51414341a992200c339cdce711b/main.nf#L364

I tried to fix that in PR #67, it seems to work locally.

I put the index prefix, but the index was in the current directory and apparently it only works if the index has its own directory, so your example works correctly, thanks !

nservant commented 4 years ago

Hi Thanks for the PR. But I think all other nf-core pipelines are using the bwt2_base. So to make all the pipelines compatible between them (and for futur DLS2 usage), I think it would be better to fix the bug keeping the prefix. I'll have a look on my side too.

cmdoret commented 4 years ago

Thank you for all the help ! I thought maybe the issue is that the extension is not trimmed when defining bwt2_base here. https://github.com/nf-core/hic/blob/b84069e1f2f1d51414341a992200c339cdce711b/main.nf#L183-L185

I am not comfortable with groovy yet, so not quite sure how to do this, perhaps just moving the truncation here (from makeBowtieIndex) would do the trick:

 else if ( params.fasta ) { 
     lastPath = params.fasta.lastIndexOf(File.separator)
     bwt2_base = params.fasta.substring(lastPath+1)
     bwt2_base = bwt_base.toString() - ~/(\.f[ans]?a)?(\.fasta)?(\.fas)?$/
nservant commented 4 years ago

yes :) absolutely. This is exactly what I did in the new PR Thanks again

cmdoret commented 4 years ago

Awesome, thank you !!