usnistgov / pfhub-schema

Phase-field simulation and benchmark schema in LinkML
Other
0 stars 1 forks source link

What about miscellaneous data? #5

Open wd15 opened 1 year ago

wd15 commented 1 year ago

This upload has an arbitrary name and value in the data section. Do we want to preserve this data?

tkphd commented 1 year ago

Yes.

wd15 commented 1 year ago

Yes.

Does the current schema support arbitrary weird values?

tkphd commented 1 year ago

Thinking about this, and the philosophy of "arbitrary values" in a schema-compliant metadata file. I'm torn between wanting to collect all the bits and enforcing reasonable boundaries/scope.

For the dataset in question, I believe timestep is the "arbitrary weird value" in question? If so, that seems like a reasonable value to collect if present, as would dx, although those quantities tend toward nebulous meanings when adaptive meshing enters the scene.

The validator throws an error when an unknown key:value pair is encountered:

ValueError:  Unknown argument: timestep = 0.00740741

Where do numerical implementation specifics like this (dx, dt vs. t) belong?

wd15 commented 1 year ago

For the dataset in question, I believe timestep is the "arbitrary weird value" in question?

Yes.

Where do numerical implementation specifics like this (dx, dt vs. t) belong?

  • In the PFHub upload metadata?

That makes sense to me. They are arbitrary specifics about a particular numerical method that is difficult to define beforehand. They might be control parameters that are either set prior or occur as a result. For example, timestep values or timestep algorithm could be set prior to a simulation or change during a simulation. It's a way to distinguish a result from another result.

  • In the implementation repository? If so, in what format?

We can't control this and it's too much scope. We need to focus on a single pfhub.yaml file.

  • In the Benchmark Problem specification?

Nope.

  • In a more generic Phase-Field Simulation metadata file?

Yes, eventually, but this doesn't exist right now. Ideally, such a file would have good schema for different numerical methods that would pin a lot of control parameters to make different results more searchable.

  • In a CSV file, alongside free energy etc?

Probably not. Probably best to keep the table data specific to what we actually require. Arbitrary values or names don't make sense in the context of table data. That's why we have the meta data file.