sjteresi / TE_Density

Python script calculating transposable element density for all genes in a genome. Publication: https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-022-00264-4
GNU General Public License v3.0
28 stars 4 forks source link

add normalization matrices #22

Closed teresi closed 3 years ago

teresi commented 4 years ago

the summed overlaps (wrt superfam | order) for each window / gene / (L / I / R) are normalized wrt relevant area in base pairs

this relevant area is typically the window in base pairs (not the same as window input variable) but can change if other locations conflict it this also might be different depending on the experiment (FUTURE)

since this will be broadcasted you may iterate over dimensions or build it out all at once for example, the summed overlaps are currently stored as (superfam | order) x windows x genes for each L / I / R as long as we can reorder the genes to match what is stored we should be able to add this feature to GeneData and create the normalization for each window, or as a n_win x n_gene matrix (provided GeneData has enough info anyways)

sjteresi commented 3 years ago

Currently working: [ ] Relevant Area Division (denominator) (Scott) [ ] fake | subset TE/Gene data for norm tests [ ] add Normalization Matrices & calcs (create denominators) [ ] add Density container (store results) [ ] add Normalization Execution / Manager to pipeline (Michael) [ ] general refactoring, density.py has too much stuff

sjteresi commented 3 years ago

See PR #40