Closed TaoHJiang closed 1 year ago
Hi,
It really depends on what you're trying to analyze. If you are looking at overall TE expression (all copies aggregated), then TEcount
works well. If you need to look at individual TE locus, you can try TElocal
.
We typically use downstream differential analysis algorithm, such as DESeq2
and edgeR
to perform the normalization (typically not TPM or VST for the purposes of differential expression), and then normalize with VST for visualization.
Thanks.
Thanks for your reply, the traditional "case control" cannot meet our difference analysis, therefore, we need to write our own script to complete this part. I do not understand the difference between TEcoun and TElocal in nature, since they can both output raw counts of gene and TE. Thanks
Hi,
TEcount
aggregates transposable elements from the same subfamily (e.g. L1HS) in the count table, whereas TElocal
has counts for each copy of the transposable element (e.g. L1HS_dup516). Thus, if you are trying to compare total L1HS between samples, you would use TEcount
, whereas if you need to look at specific locus, you would use TElocal
Thanks.
Thanks for your reply, I got it. In addition, I would like to ask if there is any recommendation for the standardized method. You said before that TPM/FPKM is not recommended
Hi,
Without knowing exactly what you're trying to compare, it would be difficult. You mention that it's not a "case-control" experiment, in which case, are you comparing within the same sample?
Thanks
Thank you, For example, we have samples of multiple tissue types at different developmental stages, and we want to know that whether a particular TE highly expressed in one tissue at a developmental stage. Thanks
We wanted to use the conventional standardization method (TPM/FPKM) for quantification and then compare the expression levels in different tissues with development stage Thanks
Hi,
You can still do "case-control" style comparisons. You can first try to visualize the TE expression using VST-normalized data to see if there's a tissue with higher expression than others (e.g. heatmap). Then you can designate tissues of interest as your "case", and the other tissues in that developmental stage as "control", and then run a case-control comparison between them using the raw counts and a differential analysis algorithm.
Or you can do "pairwise" comparison, where you designate each tissue as a case, and the others as control, and look for differential expression. In that case, I would take all the raw p-value, and do multiple testing correction (e.g. FDR) on all of them
You can also do that with different developmental stages of the same tissue, and (technically) different developmental stages of different tissues. However, the comparison might not be as useful as there are too many variables to account for.
Thanks
Thanks for your reply, If we want to compare every stage and every tissue, the number of combinations is very large. We now just to find the TE that is highest expressed (Fold change ) in a particular tissue stage as the specific TE, whether VST values can be used for expression levels?
Hi,
What you can try is to run one comparison with a differential analysis algorithm using all the samples that you want to look at. Since they are performing the normalization as part of the analysis, you can output the normalized values (and/or VST values) to then visualize on a heatmap. It doesn't really matter which exact case-control you run, as long as you can get the algorithm to output normalized counts that you can then process later on (with VST) and visualize (with heatmap).
Thanks.
Thank you very much for your kind and prompt reply! I will give it a try based on your suggestion.
Hi, The last question is can I run hundreds of samples (6GB per sample) of data at the same time, roughly how much memory is required. Thanks
If you are referring to the differential analysis, you will be using the joined count tables, so you should have Gb of data per sample. I've successfully managed 100s of samples with 20 to 30Gb of RAM (though that's me being cautious).
Thanks.
Thank you very much for your kind and prompt reply! Your suggestions are very instructive and helpful!
Dear Oliver Tam,
Thank you for providing very useful software.
I used TEcount to generate count tables for each sample, and then raw counts of all samples were merged together. After TPM/VST normalization, we do downstream difference analysis, whether the process was reasonable, or i need to replaced TEcount by TElocal.
my code is:
TEcount \ --format BAM \ -b file.bam \ --GTF Downlod_ensemble.gtf \ --TE rmsk_TE.gtf \ --mode multi \ --sortByPos \ --project test -i 10
with regards