how to extract the TE abundance

mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.

http://hammelllab.labsites.cshl.edu/software/#TEtranscripts

GNU General Public License v3.0

206 stars 29 forks source link

how to extract the TE abundance #44

Closed xiaocong3333 closed 4 years ago

xiaocong3333 commented 4 years ago

The final result is gene_TE count, but sometimes, I just need TE counts or gene counts, how should I do?

olivertam commented 4 years ago

Hi, You should be able to extract the TE counts by looking for lines where the feature (column 1) has colons in the name (e.g. L1HS:L1:LINE). In contrast, gene counts should not be using colons in their name (unless there are some unusual gene nomenclature in your model organism). If you are still having trouble, please feel free to send me a copy of your counts file. Thanks

xiaocong3333 commented 4 years ago

Got it, Thank you! So if I need the TE abundance, I need to copy the TEs one by one? Is there another way to do that? I mean any options?

olivertam commented 4 years ago

Hi, If you are comfortable with the Unix/Linux command line, you can use the built-in grep function:

grep ":" gene_TE_counts.txt > TE_counts.txt
grep -v ":" gene_TE_counts.txt > gene_counts.txt

If you are more comfortable with PowerShell (e.g. Windows), you can use Select-String:

Select-String -Path gene_TE_counts.txt -Pattern ':' | Out-File -FilePath TE_counts.txt
Select-String -Path gene_TE_counts.txt -Pattern ':' -NotMatch | Out-File -FilePath gene_counts.txt

Hope this is helpful. Thanks

xiaocong3333 commented 4 years ago

This is very helpful! Thank you!