Closed andreanuzzo closed 3 years ago
Hi Andrea,
Thank you for your question.
I assume you refer to the metadata variable mentioned in the snippet below, which is provided for the annotation argument when creating a new ESObject
.
import numpy as np
import pandas as pd
import cellex
data = pd.read_csv("./data.csv", index_col=0)
metadata = pd.read_csv("./metadata.csv", index_col=0) # this variable
eso = cellex.ESObject(data=data, annotation=metadata, verbose=True)
eso.compute(verbose=True)
eso.results["esmu"].to_csv("mydataset.esmu.csv.gz")
The metadata is essentially a map from cell-id to: a condition, a cell-type or cluster-id or other grouping that the user has defined.
cell_id | cell_type |
---|---|
cell_1 | type_A |
... | ... |
cell_9 | type_C |
The metadata is, as you say, used to compute ESµ. To be specific, the metadata is used to group single cells and compute summary statistics for the group, which are then used by the various Expression Specificity metrics (or differential expression metrics) to calculate expression specificity for each gene in the group. The different ES metrics are later summarized in the ESµ metric.
I hope that answered your question. Let me know if there's anything I can clarify.
Best, Tobi
Closing as question has been answered.
If you feel this is not the case, feel free to re-open.
Hi Tobias,
No, I wasn't referring to that, but to the metadata_class
variable which is described in the tutorial, i.e cell 5 here
with loompy.connect(pathData) as ds:
rows = (ds.row_attrs["Gene"])
cols = (ds.col_attrs[nameId])
#our data
data = pd.DataFrame(ds[:, :], index=rows, columns=cols)
# the type-annotation for individual cells
metadata = pd.DataFrame(data={"cell_type" : ds.col_attrs[nameAnno]}, index=ds.col_attrs[nameId])
metadata_class = pd.DataFrame(data={"cell_class" : ds.col_attrs[nameClass]}, index=ds.col_attrs[nameAnno])
That variable is assigned but not used anywhere. Does it mean CELLEX is able to determine ESµ specificities for other grouping, i.e. disease-related ESµ for each cell line?
Hi Andrea,
Thanks for clarifying! I am not entirely familiar with this notebook, as it was developed by a MSc student in Pers Lab.
I agree that it appears this variable metadata_class
is not used anywhere. My guess is that this MSc student used it instead of metadata
for another kind of analysis.
Does it mean CELLEX is able to determine ESµ specificities for other grouping, i.e. disease-related ESµ for each cell line?
Yes - you can specify any grouping you like that you are interested in investigating!
Thanks Tobias!
I assume that in order to do this type of grouping I have to make specific dummy variables (i.e. concatenating cell names and disease status, like B_cell_healthy, B_cell_diseased, Plasma_healthy...
) since it doesn't seem to me that the eso.compute()
function can accept a list of metadata columns, am I correct?
Yes, this is correct and sounds like a good approach.
Then you should get:
cell_id | cell_type |
---|---|
cell_1 | B_cell_healthy |
... | ... |
cell_3 | B_cell_diseased |
... | ... |
cell_9 | Plasma_healthy |
Hi,
quick question: what is the purpose of the metadata_class use in the vignette? I assume it's needed to compute ESµ between conditions? It is not specified anywhere, not in the documentation or in the publication itself or in the longer CELLECT tutorial.