williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

intergenic. ROI.bed wtih size 0 #153

Closed aiyicen2 closed 3 years ago

aiyicen2 commented 3 years ago

Hi, I have been using IRFinder-1.3.1 to build index, no errors are printed, but the size of intergenic. ROI.bed is 0. The full IRFinder command I run : /home/yanglab/ecliptools/IRFinder-1.3.1/bin/IRFinder -m BuildRefProcess -t 30 -r /home/yanglab/aiyicen/RNAseq/XJQ_9_26/00ref_genome/Human-hg38 genome.fa and trascript.gtf are from gencode human V27

drwxrwxr-x 2 yanglab yanglab 4096 10月 2 17:07 ./ drwxrwxr-x 6 yanglab yanglab 4096 10月 2 17:06 ../ -rw-rw-r-- 1 yanglab yanglab 29048250 10月 2 17:06 exclude.directional.bed -rw-rw-r-- 1 yanglab yanglab 30476008 10月 2 17:06 exclude.omnidirectional.bed -rw-rw-r-- 1 yanglab yanglab 0 10月 2 17:07 intergenic.ROI.bed -rw-rw-r-- 1 yanglab yanglab 15497717 10月 2 17:06 introns.unique.bed -rw-rw-r-- 1 yanglab yanglab 106227174 10月 2 17:07 ref-cover.bed -rw-rw-r-- 1 yanglab yanglab 7013407 10月 2 17:07 ref-read-continues.ref -rw-rw-r-- 1 yanglab yanglab 58767 10月 2 17:07 ref-ROI.bed -rw-rw-r-- 1 yanglab yanglab 7270163 10月 2 17:07 ref-sj.ref

dg520 commented 3 years ago

@aiyicen2 In gencode human V27, is chromosome annotation starting with "chr"? If so, the intergenic.ROI.bed won't be generated correctly due to an incompatibility in code. We used EnSembl when developing IRFinder, where there is no "chr" string in front of chromosome.

Please note, the rest of your IRFinder reference has been built successfully. The quantification of intron retention on annotated introns (i.e. according to GTF) won't be influenced without intergenic.ROI.bed. The only part influenced is the result stored in IRFinder-ROI.txt. The counts for the intergenic regions will be missing there. But again, this IRFinder-ROI.txt file is totally independent of IR quantification. We generates it for future extensibility of the tool.

With that being said, if you do want to have your intergenic.ROI.bed generated correctly, you can modify Line 127 in bin/util/Build-BED-refs.sh from:

| awk 'BEGIN {FS="\t"; OFS="\t"} (length($1)<=2) {$4 = "Intergenic/" $1; print $1, $2, $3, $4}' > intergenic.ROI.bed

to

| awk 'BEGIN {FS="\t"; OFS="\t"} (length($1)<=5) {$4 = "Intergenic/" $1; print $1, $2, $3, $4}' > intergenic.ROI.bed

And then re-run the reference preparation steps.

I will update the source code in the next release.

aiyicen2 commented 3 years ago

Thank you! The chromosome annotation is starting with "chr". Can I use Irfinder-ir-nondir.txt for analysisWithLowReplicates.pl analysis?

dg520 commented 3 years ago

@aiyicen2 IRFinder determines the directionality of RNASeq library in an unsupervised way. It will only generate IRfinder-ir-nondir.txt if it considers a library as non-directional while generate both IRfinder-ir-nondir.txt and IRfinder-ir-dir.txt if it considers a library as directional.

You should always use ground truth of library directionality:

  1. If you know your library is directional, stick with IRfinder-ir-dir.txt. If there is no such a file, IRFinder hasn't determined the directionality correctly. You have to pay attention to this and hold on your downstream analysis.
  2. If you know your library is non-directional, stick with IRfinder-ir-nondir.txt. If there is IRfinder-ir-dir.txt as well, it also means IRFinder hasn't determined the directionality correctly. You have to pay attention to this and hold on your downstream analysis.
aiyicen2 commented 3 years ago

I found my library is non-directional, I solved my problem, thank you for your help!