materialscloud-org / optimade-maker

Tools for making OPTIMADE APIs from various formats of structural data (e.g. an archive of CIF files).
MIT License
3 stars 0 forks source link

support for "direct" jsonl files #50

Open eimrek opened 7 months ago

eimrek commented 7 months ago

I am wondering if we really need to support "direct" jsonl files in the optimade.yaml format.

Conceptually to me it seems that the current purpose of optimake is to generate a jsonl file from other structural data formats and optimade.yaml is something that help to achieve this.

If we already have a jsonl file, then the only purpose I see is validation, and the optimade.yaml file does not seem strictly necessary.

But perhaps generating jsonl files and validating them is different enough to separate them?

E.g. we could have a different optimake subcommand for validation (optimade validate <jsonl-file>?)

Regarding the Materials Cloud Archive service, this change would affect it, as then we should add support for a "direct" jsonl file without any optimade.yaml file.

ml-evs commented 7 months ago

Would you still want people to provide an optimade.yaml to trigger the scraper at the your end? Otherwise you have to rely on file extensions and the file header which might have many false positives (we can also extend the yaml file to support/require the new database description field to be served in the OPTIMADE metadata)

eimrek commented 7 months ago

Could be that we accept either optimade.yaml or optimade.jsonl, but i agree that a single file would be preferable. But i think it's good to design optimake in a way that makes the most sense by itself.

E.g. the example

config_version: 0.1.0

database_description: >-
  This database contains some example CIFs.

entries:
  jsonl_path: example.jsonl

I think currently it doesn't do anything, right? And in the future, would it just validate the file? But maybe this makes sense, open to discuss further.

ml-evs commented 3 weeks ago

Coming back to this (perhaps it can be closed in the original context), I would quite like optimake serve . (or optimake serve optimade.jsonl) to work without needing a config file, even if it throws errors about validation of the file.

eimrek commented 2 weeks ago

yep, i agree that optimade.jsonl doesn't need to have the config file, and we could make it optional in this case.