kharchenkolab / cacoa

Single-cell Case Control Analysis
46 stars 7 forks source link

Store "groups" in the Cacoa object #2

Closed VPetukhov closed 4 years ago

VPetukhov commented 4 years ago

Parameter groups is often the same for all analyses. So we can store it as a field in the Cacoa object, and in functions set default value to the stored one: groups=self$groups. In the future it could be also nice to have a field with a metadata data.frame in ScanPy style.

pkharchenko commented 4 years ago

A couple of minor points:

  1. It’s worth keeping in mind that here we have both cell as well as sample groups. Perhaps we can consistently distinguish them syntactically
  2. My vote is for a list-style store instead of a data.frame to allow different length of factors

Best, -peter.

On Feb 17, 2020, at 18:47, Viktor Petukhov notifications@github.com wrote:

Parameter groups is often the same for all analyses. So we can store it as a field in the Cacoa object, and in functions set default value to the stored one: groups=self$groups. In the future it could be also nice to have a field with a metadata data.frame in ScanPy style.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

rrydbirk commented 4 years ago

I looked briefly at this. Right now, I could find these variables:

sample.groups - this should maybe be renamed to condition.per.sample? estimateExpressionShiftMagnitudes, estimateExpressionShiftZScores, getPerCellTypeDEmat, validatePerCellTypeParamsCacoa, rawMatricesWithCommonGenesCacoa, plotDensity

condition.per.cell - self-explanatory localZScores, plotGeneComparisonBetweenCondition

cell.subset - self-explanatory, only used in one function getTopDEGenes

groups - cell cluster factor/clustering factor. Could be renamed to cluster.per.cell (or annotation) estimateExpressionShiftMagnitudes, estimateExpressionShiftZScores, cluster.expression.distances, getPerCellTypeDEmat, validatePerCellTypeParamsCacoa

What do you think?

Also, since we already have self$sample.groups, default should be to extract this if is.null(sample.groups).

VPetukhov commented 4 years ago

sample.groups - this should maybe be renamed to condition.per.sample?

Right, we should rename it as sample.groups is a legacy name. How about groups.per.sample?

groups - cell cluster factor/clustering factor. Could be renamed to cluster.per.cell

Maybe group.per.cell to keep it universal?

Also, since we already have self$sample.groups, default should be to extract this if is.null(sample.groups).

We actually have it. All functions have default sample.groups=self$sample.groups.

What do you think?

Having list of lists for both groups and sample.groups looks like a safe option. We'll just need to add setters for them, which would print warnings in case of not-present cells / samples. And, perhaps, we also need to have getters, which would validate whether name is presented in groups/sample.groups, mostly for intrinsic usage.

rrydbirk commented 4 years ago

Right, we should rename it as sample.groups is a legacy name. How about groups.per.sample? Maybe group.per.cell to keep it universal?

I understand why you like "group", but I think it would be easier to distinguish between the two lists (for end users) if we make the naming more distinct. Having two "group" variables gets a little messy.

All functions have default sample.groups=self$sample.groups.

OK, I was not aware of this.

Having list of lists for both groups and sample.groups looks like a safe option. We'll just need to add setters for them, which would print warnings in case of not-present cells / samples. And, perhaps, we also need to have getters, which would validate whether name is presented in groups/sample.groups, mostly for intrinsic usage.

Agreed.

pkharchenko commented 4 years ago

There are cell.groups, which we’ve traditionally called ‘groups’. But in cacoa there are also groups of samples (btw. groups.per.sample sounds like something else). One option is to enforce cell.groups and sample.groups distinction explicitly.

On Mar 9, 2020, at 04:50, Rasmus Rydbirk notifications@github.com wrote:

Right, we should rename it as sample.groups is a legacy name. How about groups.per.sample? Maybe group.per.cell to keep it universal? I understand why you like "group", but I think it would be easier to distinguish between the two lists (for end users) if we make the naming more distinct. Having two "group" variables gets a little messy. All functions have default sample.groups=self$sample.groups. OK, I was not aware of this. Having list of lists for both groups and sample.groups looks like a safe option. We'll just need to add setters for them, which would print warnings in case of not-present cells / samples. And, perhaps, we also need to have getters, which would validate whether name is presented in groups/sample.groups, mostly for intrinsic usage. Agreed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

rrydbirk commented 4 years ago

This should be included in 8b505c3.

VPetukhov commented 4 years ago

Closed in https://github.com/hms-dbmi/cacoa/pull/5