quanc1989 / SV-ONT-Tibetan

Characterization of Structural Variation in Chinese samples
MIT License
15 stars 7 forks source link

enrichment analysis for repeat elements intersected with SV #2

Closed lx-1011 closed 2 years ago

lx-1011 commented 2 years ago

Dear @quanc1989 Thanks your share about this useful and detailed analyses pipeline. I have a question about Log2 foldchange (LFC) of enrichment analysis for repeat elements intersected with breakpoint junction sequence in your article, Result 2. How did you calculate the LFC ? In fact, I also run repeatmasker to get TE lib, and obtained the TE intersected with SV. I am in trouble in enrichment analyses. image

Thanks and wish you all the best Li Xin

quanc1989 commented 2 years ago

Dear @lx-1011 The enrichment analysis is based on simulations.

Firstly, as you have already done, I calculated proportions of each repeat elements for the whole SV callsets. Secondly, I run 1000 simulations and generated the same number of random genomic regions across the whole genome, each of them have the same length as my real SVs. Thirdly, do the same calculations in the first step for each of the simulation callsets. At last, you could calculate an LFC and also an experience p-value based on the real proportions of a specific repeat element in the real callset compared to the 1000 simulation values.

You could see further detail information in the supplements of the manuscript. (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02382-3)

Hope I anwsered your question.

Quan Cheng

lx-1011 commented 2 years ago

Hi @quanc1989 , thank you for the reply. I have totally understand your answer and run it sussessfuly. Thank you! Cheers, Li Xin