timothy-barry / ondisc

Space- and time-optimal algorithms for large single-cell expression matrices, with a focus on single-cell CRISPR screens.
https://timothy-barry.github.io/ondisc/
Other
11 stars 5 forks source link

Read list of .mtx files #10

Closed Samson-Dai closed 3 years ago

Samson-Dai commented 3 years ago
  1. Increase the column index of the mtx matrix. In order to do this, we need to keep a list of number of cells in each file. I store this information inside the bag_of_variables$n_cells_in_files.  I also modified some lower level helper function to compute and store the number of cells in each file.

    1. Add a new argument in "n_cells_in_files" inside arguments_enum() in covariate_computation_functs.R.
    2. Modify the get_mtx_metadata() function to compute and store the number of cells in each file in high_level_initialize_helper.R.
    3. Store the n_cells_in_files in bag_of_variables  in create_ondisc_matrix_from_mtx() in high_level_initialize.R.
  2. Fix the bugs in run_core_algo_step_2_mtxfilelist() in low_level_initialize.R

    1. increase the column index for each file we read, make sure it's integer type
    2. increase pos for each iteration
  3. Add test cases in test-multiple_input_files.R to compare with the ground truth matrix and also check the covariates

timothy-barry commented 3 years ago

Excellent work 👍