Closed pinin4fjords closed 5 years ago
Hi, Issues are also good:) Could you please provide iRAP's configuration file so that I can try to replicate the problem? Cheers.
Here you go:
atlas_run=y
data_dir=/path/to/irap_data
species=hordeum_vulgare
reference=Hordeum_vulgare.Hv_IBSC_PGSB_v2.dna.toplevel.fa.gz
cdna_file=Hordeum_vulgare.Hv_IBSC_PGSB_v2.cdna.all.fa.gz
sop2=atlas_te
gtf_file=Hordeum_vulgare.Hv_IBSC_PGSB_v2.38.gtf.gz
cont_index=/path/to/irap_data/contamination/fungi.16.1.microbial.111.2.genomic
But basically any sufficiently large genome will produce .ht2l indices rather than the expected .ht2.
Hi Nuno. I needed a fix for this so I implemented one. My proposed solution is in https://github.com/nunofonseca/irap/pull/92.
Note that https://github.com/nunofonseca/irap/pull/93 is also required to actually run HISAT2 with large indices.
Thanks for the PR!
I'm unsure of the best makey way to fix this, so I'm submitting an issue rather than a PR.
The relevant line of aux/mk/irap_map.mk causes iRAP to expect a .ht2 file to mark completion of HISAT2 indexing. Unfortunately for large genomes (e.g. hordeum vulgare), the extension is .ht2l due to creation of a large index. Solution is probably to use a completion file like.ht2indexed as a completion marker instead?