CODAP Wrapper: Metadata category

matthew-blackman commented 1 month ago

The CODAP wrapper data stream contains several quantities that are meant for organizing and separating the data, but do not relate to experimental setup or the behavior of the projectiles:

Screen identifier
Field
Sample number
Sample size

I don't think this should be grouped with Categorical or Numeric data, since these are not values that say anything about the physical setup or results of the experiment. Instead, this is metadata that we have assigned to each projectile as a way to keep track of higher-order information at the sim level. We should discuss how this information should be organized in the CODAP wrapper. I propose a separate section in the data stream UI, shown here:

catherinecarter commented 1 month ago

I agree that Sample Number is metadata, but I think Sample Size should be a Numeric Variable. Sample size impacts the overall shape and width of the sampling distributions.

It's also in the upper left panel. The other variables in the upper left panel are currently not metadata, so it feels like the Sample Size should also be labeled as a variable type as well. I advocate for Sample Size to be in the Numeric list.

matthew-blackman commented 1 month ago

Although sample size is numeric, I believe that it is metadata. The sample size is not describing anything about an individual projectile, but is keeping track of a higher-order structure that the data value falls within. All of the other Categorical and Numeric variables are things that can affect the individual projectile's behavior, so I would not want to break this pattern by putting sample size there. Keep in mind that we can also categorize it as 'numeric' within CODAP, so it would be plotted like a numeric quantity.

catherinecarter commented 1 month ago

The sample size is not describing anything about an individual projectile

Seems like all the variables currently in the Categorical section could also be described in this way, but they aren't metadata. Since sample size is a number, it clearly doesn't fit with the categorical variables.

When making decisions about the design of a study, the sample size is one of those decisions, along with all of the current categorical variables. The field number, sample number, and screen identifiers are not part of the study design, so sample size doesn't fit with the metadata variables, either.

I get what you're saying about how the other numeric variables impact the landing position of the projectiles while sample size doesn't, so I can see how you don't think sample size doesn't fits with the other numeric variables.

In a nutshell,

Sample size isn't categorical so it doesn't go there.
It's part of the design set up while the current metadata variables aren't, so it doesn't go there.
It doesn't impact the landing position of the projectiles, so you feel it doesn't go in the numeric list, either (even though it's numeric).

Maybe the labels on the variables aren't quite right? What if they had something to do with "Experiment Setup" or something? Then sample size would make more sense to go the current categorical variables? Just an idea.

matthew-blackman commented 1 month ago

The sample size is not describing anything about an individual projectile Seems like all the variables currently in the Categorical section could also be described in this way, but they aren't metadata. Since sample size is a number, it clearly doesn't fit with the categorical variables.

That is not the issue. Each of the quantities in the Categorical and Numeric sections are either independent or dependent variables that could be used to investigate the motion of the projectiles. Sample size is different - it is only useful in investigating the aggregate data of groups of projectiles. Since it is only relevant at a higher level in the hierarchy, it makes the most sense to consider this part of the metadata.

From Kelly Findley (via Slack):

Yeah, that is a little difficult to pin down what it is. I don’t think it’s terrible to list it as metadata, even if it is quite different than the others listed there. It could be nice to have a separate group like “Experiment Variables”, but would sample size be the only thing we’d realistically put there?

I agree with Kelly. If we create a new category, that would be the only quantity under that heading. I find this to be more confusing and adds unnecessary complexity to the UI.

It seems like the best path forward is to consider metadata to be all quantities that describe a higher-order structure to the data or experimental setup - Field, Sample Size, Sample Number and Screen Identifier. I feel that this issue can be closed.

ariel-phet commented 1 month ago

@matthew-blackman I was asked to take a look at this a bit earlier. Reopening.

There seem to be a couple of points of view here, and this seems worthy of a group discussion. We may well come to the same conclusion as you have above, but I definitely see how putting "sample size" into metadata creates a level of discomfort or minimization of the importance of that quantity to a statistics point of view.

Let's find a time to meet and see if we can come up with a better solution.

matthew-blackman commented 3 weeks ago

Labels have been updated and reviewed by the design team. Nice work all! Closing.

phetsims / projectile-data-lab

CODAP Wrapper: Metadata category #333