Open kousu opened 3 years ago
Well, I know I have an immediate comment to make at Journal Club. That first paper's abstract says
Advanced Scientific Data Format (ASDF) and is based on an existing text format, YAML
YAML has a lot of its own problems: it has extendable types but they're only really extendable when combined with python+pickle, which brings code-injection vulnerabilities along for the ride. It's easy to break a multiline string without noticing (we did this last year and silently broke our CI). It has barewords (like perl) meaning things like "no" is implicitly False
(but it could also be: "Norway"
, "Navigation Order"
, etc). Like JSON, it doesn't have a canonicalized form so you can't hash it safely. It's a recursive language which makes it expensive to parse -- you can't just run grep
or sed
over it safely, you need to load the entire thing into memory with a real YAML parser.
I bet googling would find a lot of other problems with YAML.
It's one big advantage over JSON is it allows comments. But for long-term scientific data? I dunno.
https://github.com/matthew-brett/czi-nibabel got a grant to make a new neuroimaging data format. Or maybe just spec out existing ones.
They were thinking about HDF5 but have discovered reservations.
They are organizing a journal club to talk about it, initially this paper on ASDF
We should get involved.
Tagging @naga-karthik @uzaymacar @andreanne-lemay @charleygros @sandrinebedard @alexfoias @dpapp86 @taowa