snehamitra / SCARlink

32 stars 6 forks source link

pseudo cells #1

Closed will-NYGC closed 8 months ago

will-NYGC commented 8 months ago

Hi,

Does SCARlink currently aggregate cells? I see you are writing neighbor cells to a file, knn-50-scatac.csv, but it doesn't look like it's being used for anything on the ATAC side as this function, get_gene_tile_matrix_group_cells, has a place holder for grouping the cells but it isn't implemented yet. Do you have plans to introduce this function soon? How significantly are the model fits impacted by the sparsity in the ATAC data without it?

snehamitra commented 8 months ago

Hi @will-NYGC, we noticed that sparsity in ATAC and/or RNA significantly impacts the performance of the model. So pseudo-bulking of cells may help. However, we haven't it explored yet. Ideally, the cells in the pseudo-bulk training set should not overlap with the cells from the held-out test set. So one of the concerns with pseudo-bulk cells is that the test set may end up being a cell population that is very different from the training set. As you pointed out, we do have a placeholder for grouping cells but at the moment we don't have a set timeline for the implementation.

will-NYGC commented 8 months ago

I see. so you're worried there will be shared info b/c of the pseudobulking of barcodes in training in test? i hadn't thought about that. thanks for the input!