add normalization matrices

teresi commented 4 years ago

the summed overlaps (wrt superfam | order) for each window / gene / (L / I / R) are normalized wrt relevant area in base pairs

this relevant area is typically the window in base pairs (not the same as window input variable) but can change if other locations conflict it this also might be different depending on the experiment (FUTURE)

[ ] output relevant area in base pairs (in numpy array) for each Left / Intra / Right, that is used to divide the overlap sum to produce density

since this will be broadcasted you may iterate over dimensions or build it out all at once for example, the summed overlaps are currently stored as (superfam | order) x windows x genes for each L / I / R as long as we can reorder the genes to match what is stored we should be able to add this feature to GeneData and create the normalization for each window, or as a n_win x n_gene matrix (provided GeneData has enough info anyways)

sjteresi commented 3 years ago

Currently working: [ ] Relevant Area Division (denominator) (Scott) [ ] fake | subset TE/Gene data for norm tests [ ] add Normalization Matrices & calcs (create denominators) [ ] add Density container (store results) [ ] add Normalization Execution / Manager to pipeline (Michael) [ ] general refactoring, density.py has too much stuff

sjteresi commented 3 years ago

See PR #40

sjteresi / TE_Density

add normalization matrices #22