seedcase-project / seedcase-sprout

Upload your research data to formally structure it for better, more reliable, and easier research.
https://sprout.seedcase-project.org/
MIT License
0 stars 0 forks source link

[discussion]: Split out check functionality to its own package? #867

Open lwjohnst86 opened 2 weeks ago

lwjohnst86 commented 2 weeks ago

What would you like to discuss?

I was struggling to fall asleep last night and one of the many thoughts swirling around was about moving the check functions into another Python package.

The main reasons to do that are that:

  1. Other packages of ours (like flower or propagate) might need some check functionality and it would be nice not to have to depend on the full sprout.
  2. Other developers might have had similar struggles as us with frictionless, so this might be a good alternative for them.
  3. We can use this package as an example to the frictionless group on how to revise their design/incorporate our work into theirs/we could merge in the package with theirs.

Thankfully, this package would be fairly small and very focused, so building it and then publishing to PyPI could be done much sooner than for sprout. It would be nice to be able to show an more tangible output of our work so far, sooner than later.

Some ideas for the name could be: datapackage-lint or datapackage-check.

martonvago commented 2 weeks ago

I think it could work and I can see the reasons justifying it, 1 and 2 very concretely and 3 in a vaguer way.

I feel like once we decide what exactly will be included in the checks and what the API will be, it will be easier for me to imagine how exactly this package would look like.

Based on some initial impressions, I think we could get away with very few dependencies/assumptions if metadata was input to the functions as a dict or JSON (as opposed to a *Properties class). And then we would just need to decide how we want to input the actual data (data frame, from file, etc.) to the functions.

lwjohnst86 commented 1 week ago

Yea, judging from many of the issues open at frictionless, I can see this being something people might use a lot.

lwjohnst86 commented 1 week ago

We decided that we will implement within sprout first and if need be, it would be fairly easy to split out into own package.