williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

Is IRFinder suitable for C.elegans IR analysis #139

Open Bigzhangwei opened 3 years ago

Bigzhangwei commented 3 years ago

Dear Sir, I want to assess IR events on C.elegans with IRFinder, but i encountered two problems, so i have to ask for your help!

  1. Is IRFinder suitable for C.elegans IR analysis? I saw you mentioned in the title of the article that it is used in mammals. In my actual use, either FastQ Mode or BAM Mode, IRFinder always printed "WARN: Very low portion of reads have a splice junction. This may indicate the experiment is not an mRNA-Seq experiment." in all WARNINGS files.
  2. What does "coverage" mean?You mentioned in "We focused on introns that were retained in more than 10% of transcripts (IR ratio >0.1) with at least a coverage of three reads across the entire intron after excluding non-measurable intronic regions" and "We recommend filtering out IR candidates with coverage less than three reads across the entire measurable intron", but i confused, is that mean "IntronDepth" in IRFinder-IR-dir.txt in Column9?

Best wishes and looking forward to your answer.

dg520 commented 3 years ago

@Bigzhangwei

  1. You can safely ignore the warning message. IRFinder does this safety check by looking at the detected splice sites on main chromosomes starting with "chr" or purely by number (e.g. 1, 2, 3...). There is no such chromosome nomenclature in C.elegans, if I'm not wrong.
  2. Both highlights you quoted have the same meaning: total number of reads in an intron. That is very different from IntronDepth, which is a stacked number of reads at a base-wised scale and a trimmed median across the entire intronic region. Please refer to the manual here for a more detailed explanation of Column 9.
Bigzhangwei commented 3 years ago

Thanks a lot! But where can i find "coverage" number, in IRFinder-IR-dir.txt, "Coverage" of Column 8 reflect ratio of mapped reads instead of "total number of reads", I could not find other "coverage" results in the output files, did i missed something?

dg520 commented 3 years ago

@Bigzhangwei The number of reads per intron is not recorded in the IRFinder output, as it's not related to IR ratio calculation defined by IRFinder. Please also note the Column 8 is NOT the ratio of mapped reads. Instead, it means how many bases of the intron is covered by at least one RNASeq reads.

With that being said, if you want to know the number of reads per intron, you can quickly run bedtools intersect with -c and -split options. The input A file will be the intron coordinates provided in the IRFinder results (i.e. first six columns) and the input B file will be your BAM file. Please consult the Bedtools manual to see if the BAM file has to be sorted by coordinates.

Bigzhangwei commented 3 years ago

Cool! It really worked!

Another strategic question is: how to deal with these IR events when you compared several samples that in some samples its' IRratio >0.1 with at least thress reads coverage and in some samples its' IRratio >0.1 with less than thress reads coverage? throw away or equal to 0? I wanted to do ANOVA analysis among four samples with two replicates, however there were many IR events showing IRratio >0.1 with less than thress reads coverage. I guess you may encounter the same question.

P.S. bedtools intersect just like a server memory terminator, I tried many times for every samples.(;′⌒`)