timothy-barry / ondisc

Space- and time-optimal algorithms for large single-cell expression matrices, with a focus on single-cell CRISPR screens.
https://timothy-barry.github.io/ondisc/
Other
11 stars 5 forks source link

Add create_ondisc_matrix_from_R_matrix #7

Closed Samson-Dai closed 3 years ago

Samson-Dai commented 3 years ago
  1. Implement create_ondisc_matrix_from_R_matrix function and its helper functions
  2. Add 3 test cases for create_ondisc_matrix_from_R_matrix
timothy-barry commented 3 years ago

Thank you, Songcheng. This looks good, but I have three requests:

  1. Please remove the comments on lines 46-49 in file initialize_in_memory.R.
  2. In your test-covariate_matrices.R script, you should be able to rewrite this code without copying and pasting so much. Just add a couple additional statements to each for loop to test cov_odms_from_memory in addition to cov_odms. For future reference, please (i) feel free to edit or modify the code in the code base, and (ii) avoid copying and pasting code as much as possible.
  3. The issue that you identified in test-covariate_matrices is not due to numerical instability; instead, the formula that you used for coefficient of variation in test-covariate_matrices differs from that in initialize_in_memory (lines 63 - 65). I take responsibility for this. (Long story short, this has to do with dividing by n vs dividing by (n-1) in calculating the variance. The sd function in R divides by (n-1), but we should divide by n instead.) Therefore, please update lines 64 - 66 in initialize_in_memory to use the coefficient of variation formula on lines 47 - 50 of test-covariate_matrices.
Samson-Dai commented 3 years ago

Thanks Tim for your reviewing, I've made these changes in the new commit:

  1. Remove the todo comments in initialize_in_memoery.R.
  2. Modify the test-covariate_matrices.R to reduce code redundancy.
  3. Update the initialize_in_memory to use the coefficient of variation formula the same astest-covariate_matrices.