mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

How to identify the Sequencing-GAP of a gene inserted by a transposon. #133

Closed Wenwen012345 closed 1 year ago

Wenwen012345 commented 1 year ago

Dear @olivertam

It's a great piece of software and we achieved our assumptions.

Now there is a problem that our manuscript was asked a question by the reviewers. Questions are as follows: "L481-500 and Fig 6C, are there any verifications of these TE-containing long genes? Do these genes contain sequencing gaps? Gap-containing genes should be filtered out since their assembly is not complete."

Mainly about the picture below. The reviewer felt that the genes we showed seemed "a bit long" and might not be in line with common sense. Maybe there was a sequencing-gap in it. We need to provide evidence.

image

You know, the transcriptome assembly software we use was STAR; the software for measuring gene expression is Stringtie2; and the software for measuring transposon expression is TEtranscripts. And all the assemblies were based on GFF3 files or GTF files (The genomes' was downloaded from NCBI and the transposons' was generated by TEsorter), but I have roughly observed that there seems to be no errors in GFF3 files (I'm not sure since I haven't been involved with bioinformatics for long.) . Genes didn't seem to be pictured as "longer". However, at present I'm also confused about the way to achieve the goal. I have not thought of a good method to identify the sequencing-gap. Do you have any good suggestions?

olivertam commented 1 year ago

Hi,

Thanks for your interest in the software. As you noted, our software is designed to measure TE transcription from bulk RNA-seq, and not for assembly of unidentified transcripts. It appears that the questions relate to the assembly of the transcripts from long reads, i.e. they occupy a larger genomic region (including introns) than expected. This, I believe, is beyond the scope of the software and our expertise. Sorry that we could not be of more help.

Thanks.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days