johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
https://miller.readthedocs.io
Other
8.71k stars 207 forks source link

Feature request for the 6 version: add YAML format support #614

Open aborruso opened 2 years ago

aborruso commented 2 years ago

Hi @johnkerl , YAML is now a very popular file format. Miller already support JSON and I think that in GO there are a lot of YAML libraries.

It would be very convenient to be able to use a yaml file, as today it is already possible with a json file (I mean with the same limitations).

Thank you

johnkerl commented 2 years ago

@aborruso thanks!

Support for a subset of YAML, is doable. Full YAML I don't want to deal with -- including possible cyclic references.

JensRoland commented 2 years ago

I think a subset of YAML is really all that is needed, since complex/ridiculous self-referential YAMLs are likely not tabular in nature anyway, and I suspect almost all Miller users are using it to wrangle tabular-like data.

johnkerl commented 2 years ago

@aborruso @JensRoland thanks for the bar-lowering reassurances!! ;)

Does either of you have some examples of some YAML data you'd like to be able to process using Miller?

aborruso commented 2 years ago

Does either of you have some examples of some YAML data you'd like to be able to process using Miller?

The first one is this https://github.com/johnkerl/miller/blob/main/docs/mkdocs.yml :)

aborruso commented 2 years ago

Does either of you have some examples of some YAML data you'd like to be able to process using Miller?

This kind https://github.com/ministero-salute/it-dgc-opendata/blob/master/datapackage.yaml

aborruso commented 2 years ago

Does either of you have some examples of some YAML data you'd like to be able to process using Miller?

To read OpenAPI schema https://app.data.opendatacovid.gssi.it/api/opendata/schema/

laubster commented 2 months ago

A while ago I had put Miller on my list of interesting looking tools to look into someday (nushell is on there too). As of late I'm converting a bunch of old data - college cross country results - out of tables accessed with SQL, migrating into YAML. I initially started using yq for processing of the migrated files, but found it easier to use Perl's YAML module to load it into a hash for processing.

Until I start wading into the Miller pool, I can't say I've got any example analysis I'd like to perform with Miller on the YAML in 20231028.txt , but I expect to someday come up with something. I suppose it might be nice to pluck out race results for a given runner across all files in a directory, presenting results in a table.