Closed wangshun1121 closed 5 years ago
Hi @wangshun1121 ,
The next release of IRFinder is not scheduled yet and I doubt if I have time to do that soon. The current IRFinder seems stable, although customized requests still pop up. I have to admit we haven't given much flexibility to user to tune parameters especially during the reference preparation. But there is always a way to work around, by tweaking some source codes.
I assumed the STAR reference folder you've already built is called STAR_ref
. For your specific case, you can do the following:
IRFinder_ref_test
cd
into IRFinder_ref_test
, make a symlink to your STAR reference by typing ln -s STAR_ref STAR
. Make sure to use the full path to STAR_ref
. This will create a symlink folder called STAR
in your IRFinder reference folder. Please note, it has to be named as STAR
. Another warning if you were not familiar with symlink: DO NOT modify any file in the symlink folder, otherwise, the corresponding file in the original folder will also be changed. Symlink is better than copy as it almost doesn't take any disk space. IRFinder_ref_test
and name it as genome.fa
. It has to be named exactly.IRFinder_ref_test
and name it as transcripts.gtf
. It has to be named exactly.
(Please note, these FASTA and GTF files are essential to calculate IRFinder reference components in addition to STAR reference)cd
into bin/util
folder of IRFinder package. Please make a copy of IRFinder-BuildRefFromEnsembl
for backup. Then we will modify IRFinder-BuildRefFromEnsembl
.BuildRefProcess
mode as before. You'll directly skip STAR reference building and start mappability calculation.Of course, instead of hard commenting out the STAR execution part, you can also make it optional on your own. Feel free to tweak the source code.
Best, Dadi
What a pity that only UNSORTED bam file can be used! Such bams are much huger than sorted ones. Storage emergency!
Hi @wangshun1121 ,
A very good point. There are some rationales we chose the unsorted bam as input: 1) the core of IRFinder, which directly works on BAM file, is written in C++. We want it work out of box, so that users don't have to install dependent libraries such as BAMTools. We think this might help the compatibility for various working environment and system setup. 2) This one is a true mistake due to my short sight: I didn't see the sequencing explosion in today's world. I didn't expect a lot of lab would deal with more than 10 RNASeq libraries at a time. 3) straightforward from coding angle as well.
I totally realize it's not only unfriendly to the storage but also difficult to be integrated into other upstream tools which doesn't keep unsorted BAM. I hope I will have time to re-write.
Best, Dadi
I have my own STAR index, and I do not want to rebuild it. Can you make it possible to skip STAR index building in
BuildRefProcess
in following release?