sjteresi / TE_Density

Python script calculating transposable element density for all genes in a genome. Publication: https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-022-00264-4
GNU General Public License v3.0
28 stars 4 forks source link

Add relevant area calculation functions and tests #40

Closed sjteresi closed 3 years ago

sjteresi commented 3 years ago

Wrote a method to calculate the normalization matrix for a given gene and window, also wrote tests for it. This is used for the final part of the calculation of density, as the resultant normalization matrix is the denominator for the density function.

I'd like to draw your attention to lines 42-70 in transposon/normalization_matrix.py because I believe I am creating the normalization dataframe in an inefficient way but was quite stumped and haven't been able to figure out a better solution. Do let me know if we can go over a potentially more efficient solution.

Regarding the tasks on the project document: [X] fake | subset TE/Gene data for norm tests [X] add Normalization Matrices & calcs (create denominators) [ ] add Density container (store results)

I am not sure about the last task and may need more guidance on that. Currently the code just returns a numpy array that is of shape (Windows, Genes) and the windows from lowest to highest and the genes are in the same order of GeneData.names which I believe are in the same order as given in the input file. Does that partially get at that last task?

teresi commented 3 years ago

[ ] add Density container (store results) I am not sure about the last task and may need more guidance on that. Currently the code just returns a numpy array that is of shape (Windows, Genes) and the windows from lowest to highest and the genes are in the same order of GeneData.names which I believe are in the same order as given in the input file. Does that partially get at that last task?

sort of by container I mean something like the gene_data wrapper it should contain the density results (it holds the data)

teresi commented 3 years ago

in the interest of practicality, we will address issues in the future commits

here's one to keep in mind: https://chris.beams.io/posts/git-commit/

your commit message could be more helpful if you

sjteresi commented 3 years ago

[ ] add Density container (store results) I am not sure about the last task and may need more guidance on that. Currently the code just returns a numpy array that is of shape (Windows, Genes) and the windows from lowest to highest and the genes are in the same order of GeneData.names which I believe are in the same order as given in the input file. Does that partially get at that last task?

sort of by container I mean something like the gene_data wrapper it should contain the density results (it holds the data)

Currently this code does not actually calculate density, it just returns the numpy array of the divisors to calculate density. Given that, I suppose then that the container for density results should be done in a different class, in a future branch/commit. Do you agree?

sjteresi commented 3 years ago

in the interest of practicality, we will address issues in the future commits

here's one to keep in mind: https://chris.beams.io/posts/git-commit/

your commit message could be more helpful if you

  • remove the 'wip' lines
  • have the first line a summary (you did this, good)
  • have subsequent lines indented and bulleted

Awesome, thank you, that's a good source. Wasn't aware that I should've just removed the 'wip' lines after I squashed and rebased.