wyang17 / SQuIRE

Software for Quantifying Interspersed Repeat Expression
Other
49 stars 29 forks source link

>80% TEs have expression > 0, is this normal? #54

Open TDLewin opened 3 years ago

TDLewin commented 3 years ago

Hi all,

I am using SQuIRE to look at TE expression in human fibroblast RNA-Seq data, but I'm slightly surprised by the results SQuIRE is producing. I am running squire Call with option -s to get subfamily level results as I do not need locus-level resolution. I find that ~81% tested TE subfamilies have mean counts > 10 and 60% > 100. With FPKM, this translates to 75% with an FPKM >2, 54% > 10 and 17% > 100. If I re-run the same data at a locus level, I find between 30-70% TE loci have counts > 10. Though I have been unable to find other data to compare to for what TE expression in fibroblasts should look like, this seems incredibly high and feels like something has gone wrong somewhere. Surely all of these TEs are not expressed in adult fibroblasts?

I am wondering whether other people see this with their data? is it to be expected with the way the read assignment algorithms work or have I made a mistake somewhere?

Is the issue with the cut-off value for when a TE is deemed 'expressed? The SQuIRE authors use a > 10 count cut-off at a locus level, so presumably it needs to be quite a lot higher than this at a subfamily level.

These are examples of the commands I am using:

squire Fetch -b 'hg38' -p 8 -r -f -c -x

squire Clean -b 'hg38'

squire Map -1 <fq file 1> -2 <fq file 2> -n -p 15 -r 75

squire Count -r 75 -n -p 15 -s 2

squire Call -1 TREATMENT1,TREATMENT2,TREATMENT3 -2 CONTROL1,CONTROL2,CONTROL3 -A Treatment -B Control -s -p 15 -N

Any advice or comments are appreciated; thanks in advance for your help.

Tom

rpg18 commented 3 years ago

Hi Tom!

I have never worked with fibroblast samples, but I was wondering if after running Call do you get consistent or random results, according to your initial hypothesis. Additionally, always it is recommended not to look at raw counts since they are not normalised.

About FPKM in embryonic fibroblasts, I believe that we expect to have abundant expression levels of TE subfamilies, since TE play a key role during embryogenesis (Gerdes P. et al. 2021). Last Friday this paper was published (He K. et al. 2021), maybe it can give you a broader idea of what we would expect to see in somatic adults cells.

For comparison purposes, you can run in parallel TEtranscripts and see what you get.

BW, Raquel

TDLewin commented 3 years ago

Hi Raquel,

Thanks so much for taking the time to reply, and for replying so quickly.

My results are broadly consistent with my initial hypothesis and are definitely not random - I see downregulation of approx. 150 subfamilies of TEs (and upregulation of very few) in my treatment vs. control samples.

Thanks for the advice on using FPKM not counts - To put my results in terms of FPKM, SQuIRE reports around 75% of TE subfamilies possessing FPKMs > 2 and 54% with FPKMs > 10. I was wondering if this is anywhere near what you have seen in your analyses? It would be interesting to know, even if they are completely different cell types.

Thanks a lot for pointing me to those papers, I had missed the scTE paper!

Best wishes,

Tom

TDLewin commented 3 years ago

Also, for this problem it would be helpful to know the % of reads that are mapping to TEs. Is there any way to get this out of the SQuIRE output? Thanks a lot!

mars188 commented 2 years ago

@TDLewin may be you can help with the issue I am facing while running "Count" and "Call".

I have 8 samples (two groups) that I ran Squire on and obtained individual count tables using "Count". Later, I performed differential expression of TEs with "Call". However, when I looked at the "SQuIRE_gene_TE_counttable.txt" file generated during the "Call" step, it shows different number of TE counts than the "*_TEcounts.txt" file generated during the "Count" step.

I was expecting that same count values generated at Count step would be transferred over and different expression would be performed with Call? Confusingly, out of 8 samples, 3 show that TE count values match between "*_TEcounts.txt" and "SQuIRE_gene_TE_counttable.txt" files but not for the rest 5 samples.

Will really appreciate your help!