williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

reading in technical repeats to IRFinder #98

Closed ojziff closed 4 years ago

ojziff commented 4 years ago

Hi,

I have 3 technical repeats (sequenced across 3 lanes, PE reads) for each biological replicate. I am trying to read in technical repeat fastq files together into IRFinder. I've not merged the separate lanes per samples fastq files. How do I input technical (lane) replicates together? With STAR i do this with:

--readFilesIn Cond1_Lane1_R1.fastq.gz,Cond1_Lane2_R1.fastq.gz,Cond1_Lane3_R1.fastq.gz Cond1_Lane1_R2.fastq.gz,Cond1_Lane2_R2.fastq.gz,Cond1_Lane3_R2.fastq.gz

However, when i use this comma separated (replicates) and space separated (pairs) I get error: IRFinder -r $REF -a none -d $OUT Cond1_Lane1_R1.fastq.gz,Cond1_Lane2_R1.fastq.gz,Cond1_Lane3_R1.fastq.gz Cond1_Lane1_R2.fastq.gz,Cond1_Lane2_R2.fastq.gz,Cond1_Lane3_R2.fastq.gz Argument error: in run mode FastQ, provide either one or two fastq files. 0 arguments found.

I also tried and failed with the -y option to respecify --readFilesin: IRFinder -r $REF -d $OUT -a none -y "--readFilesIn Cond1_Lane1_R1.fastq.gz,Cond1_Lane2_R1.fastq.gz,Cond1_Lane3_R1.fastq.gz Cond1_Lane1_R2.fastq.gz,Cond1_Lane2_R2.fastq.gz,Cond1_Lane3_R2.fastq.gz" Argument error: in run mode FastQ, provide either one or two fastq files. 0 arguments found.

I am trying to obtain one BAM for each biological replicate that contains read group information for the 3 lanes. Is there are way to reads these in together in IRFinder or should i (a) merge the technical repeat IRFinder Unsorted.bam with samtools tools (i think this only works with sorted bam files) or (b) merge the technical repeat fastq files before running IRFinder?

Thank you for your help, Oliver

dg520 commented 4 years ago

Hi @ojziff ,

Yes, IRFinder can only take 1 or 2 input reads file(s). It's designed like that on purpose. In your case, merging the FASTQs of technical replicates is the easiest way to go. Otherwise, you have to sort each BAM, merge them and unsort the merged file.

P.S.: In Technical replicate doesn't contribute to any statistics downstream, only biological replicate does. From a sequencing point of view, multiple lanes are usually used to increase depth. So it totally makes sense to pool lanes before mapping.

Best, Dadi