hadasvolk / CompLabNGS

Computational Lab in Next Generation Sequencing and Genomics Data Analysis - TAU 0411358701
MIT License
1 stars 1 forks source link

A Problem With Fastuniq #14

Closed Ataliai closed 1 month ago

Ataliai commented 2 months ago

According to the files I received in the project (GSE229677) I should run fastuniq for PE. Each such file is huge (11 GB) and it seems that I don't have enough RAM on my computer for this purpose (there is a log of out of memory). Is there another alternative?

hadasvolk commented 2 months ago

Why would you want to run fastuniq? what is the biological experiment you are analyzing and what will you benefit from deduplication?

tehilayehudai commented 2 months ago

Hi, I join the question. We thought a duplicate should be removed based on this slide in presentation 8.

IMG-20240516-WA0053

hadasvolk commented 2 months ago

Ok, I understand. It is a valid confusion probably caused by me; I'll make sure to remove it in future courses.

What I tried to say is that deduplication can be used in RNA Seq if there is a sufficient QC information that encourage us to do it. In RNA Seq we ultimately aim to quantify the expression profiles of transcripts, as such we expect some level of duplication in the data. If our QC returns huge duplication values then we should consider deduplicate it. On the data you downloaded there is no need to do that

tehilayehudai commented 2 months ago

I understand.. We will repet the analysis again, because we have already removed the duplicates. Thanks for the quick reply!!

KerenRozen commented 1 month ago

Is it possible to continue the analysis with the data after using fastuniq?

hadasvolk commented 1 month ago

It is possible, but you need to explain the rational behind doing so. What part of the data or understanding of the biological context of the experiment set you on this rational?!

KerenRozen commented 1 month ago

OK. Thanks for your answer