mhammell-laboratory / TEsmall

A pipeline for profiling TE-derived small RNAs
GNU General Public License v3.0
6 stars 5 forks source link

Duplicate gene names #12

Closed sanatbhadsavle closed 2 years ago

sanatbhadsavle commented 2 years ago

Hi, After running TESmall on mouse small RNA samples, I get a counts summary file that has a lot of repeated entries specially for the tRNAs, this will hamper proper DESeq2 functioning. Is there a way around that when using the tool?

olivertam commented 2 years ago

Hi,

Thank you for your feedback. TEsmall is generating count output of repetitive sequences (e.g. tRNA and transposable elements) in both sense and antisense orientation. Hence, each of those annotation could have two entries (one for sense, and one for antisense). You can either just select the sense counts for DESeq2, combine the sense and antisense counts, or (our recommendation), append column 2 to column 1 to get the "unique" key for DESeq2. In fact, since TEsmall is very granular in its count output (i.e. each exon and intron has its own count), you may want to do some post-processing to combine all exonic/intronic reads of a transcript/gene into a single value, in addition to handling the sense and antisense counts. Please let me know if you have further questions.

Thanks.

sanatbhadsavle commented 2 years ago

HI Oliver, Thank you so much for getting back so soon. I have attached a representative image of the repeated sequences in this email. The tRNA that are annotated as structural RNA have the same gene names including the 'copy number'. Is this how the output is supposed to be?

On Tue, Jul 26, 2022 at 3:05 PM Oliver Tam @.***> wrote:

Hi, Thank you for your feedback.⁠​ TEsmall is generating count output of repetitive sequences (e.⁠​g.⁠​ tRNA and transposable elements) in both sense and antisense orientation.⁠​ Hence, each of those annotation could have two entries (one for sense, ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Hi,

Thank you for your feedback. TEsmall is generating count output of repetitive sequences (e.g. tRNA and transposable elements) in both sense and antisense orientation. Hence, each of those annotation could have two entries (one for sense, and one for antisense). You can either just select the sense counts for DESeq2, combine the sense and antisense counts, or (our recommendation), append column 2 to column 1 to get the "unique" key for DESeq2. In fact, since TEsmall is very granular in its count output (i.e. each exon and intron has its own count), you may want to do some post-processing to combine all exonic/intronic reads of a transcript/gene into a single value, in addition to handling the sense and antisense counts. Please let me know if you have further questions.

Thanks.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/mhammell-laboratory/TEsmall/issues/12*issuecomment-1195925561__;Iw!!KwNVnqRv!C22jOVxIbLOL2nrsqTS-3Eq-9a99nWLwOevGzh67KwTuwhzDOuawqhnb-QKmVST4Mxh_zOBccSCfg9hliTkDl6S71q344g$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AWLQFNVCCOOXJNVORQH6ULLVWBAJFANCNFSM54W6HLYA__;!!KwNVnqRv!C22jOVxIbLOL2nrsqTS-3Eq-9a99nWLwOevGzh67KwTuwhzDOuawqhnb-QKmVST4Mxh_zOBccSCfg9hliTkDl6RXIoZGnQ$ . You are receiving this because you authored the thread.Message ID: @.***>

-- With regards Sanat Bhadsavle

olivertam commented 2 years ago

Hi,

I think I understand the issue now. You are right that this is affecting only the tRNA portion. We suspect that this is occurring due to reads from the tRNA fragment analysis not added properly. I'll patch it and let you know once it is fixed.

Thanks.

sanatbhadsavle commented 2 years ago

Thank you so much. I appreciate your efforts.

On Wed, Jul 27, 2022 at 11:55 AM Oliver Tam @.***> wrote:

Hi, I think I understand the issue now.⁠​ You are right that this is affecting only the tRNA portion.⁠​ We suspect that this is occurring due to reads from the tRNA fragment analysis not added properly.⁠​ I'll patch it and let you know once it is ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Hi,

I think I understand the issue now. You are right that this is affecting only the tRNA portion. We suspect that this is occurring due to reads from the tRNA fragment analysis not added properly. I'll patch it and let you know once it is fixed.

Thanks.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/mhammell-laboratory/TEsmall/issues/12*issuecomment-1197040810__;Iw!!KwNVnqRv!GwX-x79d0Vu9tmuTn02AexLaS7v6jTdzwp70BSAXho46R8CzYEFrR2spUENGFd-glxmTwFOsDk3-oUnQxPkd6jiHgFkAnw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AWLQFNVQG6EZFKJDBRAEVWDVWFSWHANCNFSM54W6HLYA__;!!KwNVnqRv!GwX-x79d0Vu9tmuTn02AexLaS7v6jTdzwp70BSAXho46R8CzYEFrR2spUENGFd-glxmTwFOsDk3-oUnQxPkd6jhaWePXVA$ . You are receiving this because you authored the thread.Message ID: @.***>

-- With regards Sanat Bhadsavle

olivertam commented 2 years ago

Hi,

I have added a fix to TEsmall (version 2.0.1). Hopefully that would resolve the issue. Please let me know if you encounter more problems.

Thanks.

sanatbhadsavle commented 2 years ago

Thank you so much.

olivertam commented 2 years ago

Hi,

It would require a reinstall, as we don't have an auto-update approach. Sorry for the inconvenience.

Thanks