junchaoshi / sports1.1

Small non-coding RNA annotation Pipeline Optimized for rRNA- and tRNA-Derived Small RNAs
GNU General Public License v3.0
45 stars 16 forks source link

differential analysis #28

Closed sunhaifeng123 closed 1 year ago

sunhaifeng123 commented 1 year ago

Hi,

Thanks in advance for this github project.

The sports1.1 is really nice for small-RNA-Seq data analysis, especially its updated version from 1.0 to 1.1.

I ran the pipeline successfully and the next step is to do differential analysis.

But I'm puzzled that the results are assigned by a single sample, like the quantitative result data comprising Sequence and Reads. while my sample group is 3V3, how to merge the samples to a combined data, can i use the Sequence as the reference between samples?

And the following differential analysis step, can i use R package DESeq2 or other recommend packages?

Thanks again and looking forward to your reply!

Best,

Haifeng Sun Nanjing Medical University, China 2022-09-29

junchaoshi commented 1 year ago

Hi Haifeng,

Since SPORTS1.1 can output the raw reads for each sequence, DEseq2 or equivalents are compatible with the downstream differential analysis. However, there are some caveats in sncRNA expression analysis, which are described in our latest perspective paper: https://www.nature.com/articles/s41556-022-00880-5. Please make sure to use the appropriate method to validate the output from DEseq2 or equivalents.

Best, Junchao

sunhaifeng123 commented 1 year ago

Hi Junchao,

Thank you for your timely reply, and also the excellent works of tsRNA conducted by you and Dr. Chen.

I would read your paper carefully.

Best wishes,

Haifeng

SergioRodLla commented 5 months ago

Hi @junchaoshi,

I have read the paper you mentioned and understand the sncRNA caveats you cover regarding validation of the results of differential expression analysis. I want to conduct a differential expression analysis, using for example DESeq2, I wonder if either the *summary.txt or *output.txt SPORTS output files are suitable for this. From the first one, can the "Sub_Class" and "Reads" fields be used directly for this?, or would it be better to use the "sequence" and "reads" fields from the *output.txt file? Should the "reads" number be normalized (for example RPM)?

Best, Sergio

junchaoshi commented 5 months ago

H Sergio,

As represented, "Class" or "Sub_Class" correspond to types of RNA, while "Sequence" denotes the specific RNA sequence. You may choose either category depending on the purpose of analysis. The "Reads" column in either the summary.txt or output.txt file displays the raw reads for the representative category. This is the recommended input for DESeq2.

Best, Junchao

SergioRodLla commented 5 months ago

Thank you for the quick answer!

Sergio