Machine readable dataset

schemedoc / srfi-metadata

Import SRFI metadata into the Scheme API

https://docs.scheme.org/srfi/support/

MIT License

10 stars 2 forks source link

Machine readable dataset #56

Closed port19x closed 2 months ago

port19x commented 2 months ago

As an alternative/placeholder for #9, having a machine readable format be generated would help a lot. JSON seems like a good idea, but I'm not opposed to sexps, edn, xml, or whatever. This way I can write a layer 2 that uses the dataset directly, without resorting to error prone/brittle html parsing

lassik commented 2 months ago

IMHO the best default answer for schemedoc repos is to generate S-expressions, which are then turned into HTML and whatever else people need.

Currently the listings/*.sh shell scripts generate files into the data directory. The files contain lists of numbers.

lassik commented 2 months ago

A list of numbers parses as S-expressions, but is not very extensible.

I've been trying nudge most datasets I work on toward POSE, which is designed to be readable using the native read of most Lisp dialects.

lassik commented 2 months ago

A substantial example of POSE is the file from which the scheme.org DNS zon and front page are generated.

port19x commented 2 months ago

I'm not sure I follow. Looking into the data directory of this repo, I only see a .gitkeep file.

You mean I can run the shell scripts to generate that data offline? Is that what is happening on the server of docs.scheme.org?

If so, would you be open to either providing a raw endpoint, or generating the data via github actions, possibly onto a separate branch as to not pollute things? That would be vaguely similar to github pages and could even be served through that

lassik commented 2 months ago

You mean I can run the shell scripts to generate that data offline?

Yes.

Is that what is happening on the server of docs.scheme.org?

No. Currently someone runs the scripts manually on his PC and then rsync's the result onto the server.

If so, would you be open to either providing a raw endpoint, or generating the data via github actions, possibly onto a separate branch as to not pollute things?

It would probably be simplest to commit the data files into the repo. What do you think?

port19x commented 2 months ago

It would probably be simplest to commit the data files into the repo

I agree, under the assumption that this data doesn't change more than once a week on average. Anything beyond that calls for automation.

relevant xkcd

lassik commented 2 months ago

@port19x Now the data files has been committed.