interactivereport / RNASequest

18 stars 3 forks source link

TPM adjustment for numerical variables - working correctly? #2

Closed ferg-0 closed 3 years ago

ferg-0 commented 3 years ago

@z5ouyang Example is: /camhpc/ngs/projects/TST11781/dnanexus/20210607022720_Maria.Zavodszky/EA20210831_0/Sc Adjusting for exonic rate which should be numerical - getting strange PCA before and after adjustment Before image After: image

Is it genuinely treating exonic rate as numeric?

z5ouyang commented 3 years ago

@ferg-0 I confirmed it is using "exonic rate" as numeric. And I checked all known meta information and couldn't identify the correlated meta information. There is something interesting here. I checked the top 3 genes consist ~60% of TPM counts image I then used RIN (numeric) to adjust, there is no clear grouping: image Finally I added a small value to "Exonic_Rate" with random normal distribution of (mean=0 and sd=0.05, 0.01, 0.5), you can observe the disappearance of the grouping when the larger random values were added. image image image

I am not sure if this is related to EA. Let me know what do you think.

z5ouyang commented 3 years ago

@ferg-0 , did more investigate found out the a few top genes (from union of top 5 genes across all samples, accounted for more than 70% adjusted TPM), contribute to the grouping for exonic_rate adjusted logTPM:

image image image image

ENSMFAG00000008793: MALAT1 (RF01871) ENSMFAG00000011723 ENSMFAG00000009250: SNORD116 (RF00108) ENSMFAG00000005139: SNORD14

So I think I can close this. You can reopen this.

ferg-0 commented 3 years ago

@z5ouyang Thanks for looking into this - previously I had put top gene in the covariate list - it seems this might be something we do as a standard practice. Not sure how in this case we would proceed, adjust for top three genes as well? We can discuss during the regular meetings.