nunofonseca / irap

integrated RNA-seq Analysis Pipeline
GNU General Public License v3.0
82 stars 33 forks source link

iRAP does not recognise stage0 completion for large genomes with HISAT2 #87

Closed pinin4fjords closed 5 years ago

pinin4fjords commented 6 years ago

I'm unsure of the best makey way to fix this, so I'm submitting an issue rather than a PR.

The relevant line of aux/mk/irap_map.mk causes iRAP to expect a .ht2 file to mark completion of HISAT2 indexing. Unfortunately for large genomes (e.g. hordeum vulgare), the extension is .ht2l due to creation of a large index. Solution is probably to use a completion file like .ht2indexed as a completion marker instead?

nunofonseca commented 6 years ago

Hi, Issues are also good:) Could you please provide iRAP's configuration file so that I can try to replicate the problem? Cheers.

pinin4fjords commented 6 years ago

Here you go:

atlas_run=y
data_dir=/path/to/irap_data
species=hordeum_vulgare
reference=Hordeum_vulgare.Hv_IBSC_PGSB_v2.dna.toplevel.fa.gz
cdna_file=Hordeum_vulgare.Hv_IBSC_PGSB_v2.cdna.all.fa.gz
sop2=atlas_te
gtf_file=Hordeum_vulgare.Hv_IBSC_PGSB_v2.38.gtf.gz
cont_index=/path/to/irap_data/contamination/fungi.16.1.microbial.111.2.genomic

But basically any sufficiently large genome will produce .ht2l indices rather than the expected .ht2.

pinin4fjords commented 6 years ago

Hi Nuno. I needed a fix for this so I implemented one. My proposed solution is in https://github.com/nunofonseca/irap/pull/92.

pinin4fjords commented 6 years ago

Note that https://github.com/nunofonseca/irap/pull/93 is also required to actually run HISAT2 with large indices.

nunofonseca commented 5 years ago

Thanks for the PR!