williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

Problem with Mapability step of BuildRefProcess on hg38 #35

Closed cbenoitp closed 6 years ago

cbenoitp commented 6 years ago

Hi,

I'm currently trying to build a IRFinder reference for the human genome hg38 with EnsEMBL version 80. The first step (construction of STAR index) of the BuildRefProcess works well. But the process gets stalled on the mapability step. Indeed, in 20hr, only ~325000 reads were mapped. The Log.progess.out file shows that the STAR mapping speed is 0M/hr.

Here is my command line :

IRFinder-IRFinder-1.2.4/bin/IRFinder -m BuildRefProcess -t 4 -S STAR-2.5.0b/bin/Linux_x86_64/STAR -r IRFinder-IRFinder-1.2.4/Human_hg38_Ensembl80 -R IRFinder-IRFinder-1.2.4/REF/extra-input-files/Human_hg38_nonPolyA_ROI.bed

I tried to run this job on various machines (always using 4 threads) with the same results. Moreover, I build IRFinder reference without any problem on hg19 with EnsEMBL75 annotation (the mappability step took ~1hr to run).

Any help would be greatly appreciated.

Clara

dg520 commented 6 years ago

Hi Clara,

Sorry for a late reply. Yes, we notice STAR might encounter some problems when building reference for hg38. There is a series of discussion about this issue on STAR google group. My suggestion here is to trim your hg38 genome fasta file to keep only main chromosomes without other contigs or scaffolds and then build STAR/IRFinder reference on that trimmed fasta. You might also want to trim your gtf file correspondingly.

Let me know if that helps.

Best, Dadi

cbenoitp commented 6 years ago

Hi Dadi,

Thank you for your answer. Trimming the hg38 fasta file to keep only the main chromosomes indeed solved the problem.

Thank you very much for your help.

Best, Clara