Closed teresi closed 3 years ago
Currently working: [ ] Relevant Area Division (denominator) (Scott) [ ] fake | subset TE/Gene data for norm tests [ ] add Normalization Matrices & calcs (create denominators) [ ] add Density container (store results) [ ] add Normalization Execution / Manager to pipeline (Michael) [ ] general refactoring, density.py has too much stuff
See PR #40
the summed overlaps (wrt superfam | order) for each window / gene / (L / I / R) are normalized wrt relevant area in base pairs
this relevant area is typically the window in base pairs (not the same as window input variable) but can change if other locations conflict it this also might be different depending on the experiment (FUTURE)
since this will be broadcasted you may iterate over dimensions or build it out all at once for example, the summed overlaps are currently stored as
(superfam | order) x windows x genes
for each L / I / R as long as we can reorder the genes to match what is stored we should be able to add this feature to GeneData and create the normalization for each window, or as an_win x n_gene
matrix (provided GeneData has enough info anyways)