shalgilab / DoGFinder

5 stars 10 forks source link

Resampled alignment files #3

Open sameet opened 5 years ago

sameet commented 5 years ago

Hi, It seems that from the .bam file, only 200000 reads are sampled. Is there a reason for this (apart from increase in compute time). For Human samples the number of reads can be huge, does the number of 200000 seem too small? Is there anyway to increase the number. Would a number of say 2 Million wont be better?

Sameet

Romicak commented 5 years ago

Hi Sameet, did you try running this pipeline with non-stranded RNA-seq data? I am having problems running the pipeline with non-stranded data. It is having errors in step 2. I opened a new issue regarding this, but just wanted to check if you also came across similar problems. Thanks

StellamarisSoares commented 5 years ago

Hi, Sameet. My data is from mice, not humans. When I ran the Pre_Process, the number of reads was not limited to 200000, but limited to the library with less mapped reads. In my case, the largest library has more than 50 million mapped reads and the smallest library has about 34.5 million. After downsample, all libraries were left with about 34.5 million reads. I hope this helps you understand your output in this step.

StellamarisSoares commented 5 years ago

Hi again, Sameet. I'm rereading the DoGFinder paper and found some interesting information. In the "Results" topic, the authors wrote "Interestingly, while the number of DoGs has not been saturated even at a library size of 200 M reads, DoG length has reached saturation already at 25 M read library depth, but only for the untreated cells". I think this is why only 200000 readings are sampled from the bam file.