Closed ferg-0 closed 3 years ago
@ferg-0 I confirmed it is using "exonic rate" as numeric. And I checked all known meta information and couldn't identify the correlated meta information. There is something interesting here. I checked the top 3 genes consist ~60% of TPM counts I then used RIN (numeric) to adjust, there is no clear grouping: Finally I added a small value to "Exonic_Rate" with random normal distribution of (mean=0 and sd=0.05, 0.01, 0.5), you can observe the disappearance of the grouping when the larger random values were added.
I am not sure if this is related to EA. Let me know what do you think.
@ferg-0 , did more investigate found out the a few top genes (from union of top 5 genes across all samples, accounted for more than 70% adjusted TPM), contribute to the grouping for exonic_rate adjusted logTPM:
ENSMFAG00000008793: MALAT1 (RF01871) ENSMFAG00000011723 ENSMFAG00000009250: SNORD116 (RF00108) ENSMFAG00000005139: SNORD14
So I think I can close this. You can reopen this.
@z5ouyang Thanks for looking into this - previously I had put top gene in the covariate list - it seems this might be something we do as a standard practice. Not sure how in this case we would proceed, adjust for top three genes as well? We can discuss during the regular meetings.
@z5ouyang Example is: /camhpc/ngs/projects/TST11781/dnanexus/20210607022720_Maria.Zavodszky/EA20210831_0/Sc Adjusting for exonic rate which should be numerical - getting strange PCA before and after adjustment Before After:
Is it genuinely treating exonic rate as numeric?