seedcase-project / seedcase-sprout

Upload your research data to formally structure it for better, more reliable, and easier research.
https://sprout.seedcase-project.org/
MIT License
0 stars 0 forks source link

Update models to match new project structure #512

Open martonvago opened 2 months ago

martonvago commented 2 months ago

Update models to match the new setup where metadata and data files are stored on the disk as part of core.

martonvago commented 1 month ago

Notes:

Questions:

  1. Should we use json or yaml metadata files?
  2. Can a Table be composed of multiple Files? Most of the code suggests yes, but sometimes only one File is expected for a Table.
  3. Why does Table have an original_file_name property when File also has this property and, pending the question above, one Table can be composed of multiple Files?
  4. To what extent do we want to handle Resources and data Files separately? Can a Package have a Resource without a corresponding data File? And vice versa, can a data File exist in a project folder without it being listed as a Resource in the Package?
  5. Is putting data file information into a files array on Resource acceptable?
  6. Can we use name on Resource as a unique identifier for theResource? We currently require it to be unique anyway.
  7. If we include custom properties (i.e. properties not in the frictionless standard by default) in our metadata, then we’ll be faced with the question of how we want this custom data to be parsed / validated. The default behaviour is that custom properties are accessible under a custom property and are not validated when the metadata file is loaded from datapackage.json. We could try mixing the frictionless metadata classes with e.g. pydantic dataclasses to get both validation for custom properties and frictionless functionality. The downside is that this would be a bit complex, especially because we have custom properties all the way down the nested schema structure. The same question arises when we construct these objects programmatically: do we just add custom properties under custom or do we want a more involved setup?
  8. Should we move over to frictionless classes completely? If so, we will be dropping all of our models.
  9. Should we move over to frictionless terminology completely? E.g. use "package" instead of "project" and "resource" instead of "metadata" everywhere, including URLs, template names etc.?
  10. Are we really okay with not storing metadata in a database? Are there tools enabling us to get nice change log and audit information using this type of storage? Are there ways of rolling back changes or backing up data to be able to recover it if needed?
  11. How do we make our "secure storage" secure and how do we handle access control?

List of subtasks:

martonvago commented 1 month ago

I put some foundational work on https://github.com/seedcase-project/seedcase-sprout/tree/refactor/update-models

martonvago commented 1 month ago

The branch above now contains more of a detailed exploration of how I understood things should work in the new setup. It's in no way complete (I haven't worked through the entire stepper form yet) and I ignored tests and doc strings (because it'll all change anyway). It also has some bugs etc., so at any given commit it might not actually build, but it's still useful as a point of departure / conversation starter, I think.

signekb commented 1 month ago

Very nice work, @martonvago 🔥 🔥 ! There's a lot of questions here. I'm not sure whether you and @lwjohnst86 have already discussed these, but maybe it would make sense to go through them at tomorrow's status meeting (answering/closing them or creating discussion issues for those we don't have answers for right now)? I think this will help all of us in the upcoming weeks!

lwjohnst86 commented 1 month ago

Very nice! Some of the comments are covered already by the naming scheme for project files (see https://github.com/seedcase-project/seedcase-sprout/blob/main/docs/design/naming.qmd)

lwjohnst86 commented 1 month ago

@martonvago not sure if you are still working on this, but given the focus on core functions rather than Django, I will move this out of this iteration since it isn't relevant right now. It will be relevant later though!! So keeping it open.

martonvago commented 1 month ago

Sure, good idea, I haven't done anything with it since I came back. And the outcome of the new reviews (🔥) will influence how exactly this will be done anyway!