Open mugpeng opened 2 years ago
Hi there! Thank you for your interest in our work and for your query. We use C to store the total number of cells in each sample.
Thanks for your reply! And could you serve the script you deal with tbru, fibroblast and pbmc.
For example, GEO data only gives me a count file, and it need to be splited into RNA and protein modalities and they need to be integrated (CCA) later as you described in the paper. Besides, there are also some calculations you make like C
or any others.
Because it's hard for me to search clues in cna-display and cna-sim, which I only find the process you make simulation data for tbru dataset.
Hi @mugpeng! We received these data objects from the study authors with substantial processing from their source publications (e.g. a multimodal CCA embedding for the TB dataset, cluster assignments for all three datasets). I believe the primary pre-processing you'll want in a data object you feed to one of our sim.py scripts includes: 1) total cells per sample stored in d.samplem.C
, 2) cluster assignments stored in data.obs
under a label of your choice that you feed as the causal_clustering
argument, 3) pre-processing with scanpy to construct a PCA embedding and nearest-neighbor graph of the cells, and 4) pre-processing with CNA to construct a neighborhood abundance matrix.
Hi,
I recently read your work and also interested about the works related with perturbation. And I was trying to mimic your procedures for creating simulation data and ground truth.
However, I have some problems in sim_null.
There is an element named C in your MultiAnnData object which is single cell data from tbru, but I didn't find a column named
C
in its metadata from GEO(only batch information). What should I do to the GEO data and its metadata in order to make the object like yours running the sim_null script.My simple process is below.
Thanks.