pepkit / pipestat

Pipeline results reporting package
https://pep.databio.org/pipestat/
BSD 2-Clause "Simplified" License
4 stars 2 forks source link

Specify database schema using python objects instead yaml file #168

Open khoroshevskyi opened 4 months ago

khoroshevskyi commented 4 months ago

It would be awesome to specify database schemas using Python objects, such as classes or methods. I believe this approach could enhance the user experience because Python objects are easier to write and can help catch errors more effectively

nsheff commented 4 months ago

I think I can see what you mean. A potential problem with this is that then the schema is only useful within Python...

I guess the way we're doing it now is that you write a JSON-schema, and then we convert that into Pydantic objects. So, I suppose it wouldn't be hard to also allow the other direction: user starts from a Pydantic model. I guess you would just create a ParsedSchema object from a pydantic model. It's probably even easier to do than coming from the JSON-schema.

I think there are then also ways to go from those models to a JSON-schema, in case you need a file-based representation...

And in fact, for pipestat, we do need a file-based representation.

So I think the way to approach this would be to just show people how to do Pydantic-to-JSON-Schema, which should be simple. So then you can write your schema with Python objects, save them to JSON schema, and then pass this to pipestat.