williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

Can I use my own STAR index? #58

Closed wangshun1121 closed 5 years ago

wangshun1121 commented 5 years ago

I have my own STAR index, and I do not want to rebuild it. Can you make it possible to skip STAR index building in BuildRefProcess in following release?

dg520 commented 5 years ago

Hi @wangshun1121 ,

The next release of IRFinder is not scheduled yet and I doubt if I have time to do that soon. The current IRFinder seems stable, although customized requests still pop up. I have to admit we haven't given much flexibility to user to tune parameters especially during the reference preparation. But there is always a way to work around, by tweaking some source codes.

I assumed the STAR reference folder you've already built is called STAR_ref. For your specific case, you can do the following:

  1. make a folder where you want to save your IRFinder reference, say IRFinder_ref_test
  2. cd into IRFinder_ref_test, make a symlink to your STAR reference by typing ln -s STAR_ref STAR. Make sure to use the full path to STAR_ref. This will create a symlink folder called STAR in your IRFinder reference folder. Please note, it has to be named as STAR. Another warning if you were not familiar with symlink: DO NOT modify any file in the symlink folder, otherwise, the corresponding file in the original folder will also be changed. Symlink is better than copy as it almost doesn't take any disk space.
  3. copy or symlink your genome FASTA file into IRFinder_ref_test and name it as genome.fa. It has to be named exactly.
  4. copy or symlink your transcriptome GTF file into IRFinder_ref_test and name it as transcripts.gtf. It has to be named exactly. (Please note, these FASTA and GTF files are essential to calculate IRFinder reference components in addition to STAR reference)
  5. cd into bin/util folder of IRFinder package. Please make a copy of IRFinder-BuildRefFromEnsembl for backup. Then we will modify IRFinder-BuildRefFromEnsembl.
  6. comment out line 191 to 200. This will skip the execution of STAR reference build, as you want.
  7. Run BuildRefProcess mode as before. You'll directly skip STAR reference building and start mappability calculation.

Of course, instead of hard commenting out the STAR execution part, you can also make it optional on your own. Feel free to tweak the source code.

Best, Dadi

wangshun1121 commented 5 years ago

What a pity that only UNSORTED bam file can be used! Such bams are much huger than sorted ones. Storage emergency!

dg520 commented 5 years ago

Hi @wangshun1121 ,

A very good point. There are some rationales we chose the unsorted bam as input: 1) the core of IRFinder, which directly works on BAM file, is written in C++. We want it work out of box, so that users don't have to install dependent libraries such as BAMTools. We think this might help the compatibility for various working environment and system setup. 2) This one is a true mistake due to my short sight: I didn't see the sequencing explosion in today's world. I didn't expect a lot of lab would deal with more than 10 RNASeq libraries at a time. 3) straightforward from coding angle as well.

I totally realize it's not only unfriendly to the storage but also difficult to be integrated into other upstream tools which doesn't keep unsorted BAM. I hope I will have time to re-write.

Best, Dadi