psychoinformatics-de / datalad-hirni

DataLad extension for (semi-)automated, reproducible processing of (medical/neuro)imaging data
http://datalad.org
Other
5 stars 8 forks source link

feature request: studyspec.json does not depend on formatting #136

Open pvavra opened 4 years ago

pvavra commented 4 years ago

Esp. since the webapp interface isn't working currently, "manually" editing the studyspec.json is currently obligatory. However, the import depends on the specific "no whitespace" formatting of the studyspecs.json.

That is, by making the json more human readable, it breaks the conversion step via datalad hirni-spec2bids.

The relevant error message was:

spec2bids(ok): /home/pvavra/scratch/bids/sourcedata/studyspec.json
[ERROR  ] Failed to load content from '{\n' with args=() kwargs={} 
[ERROR  ] Expecting property name enclosed in double quotes or '}': line 2 column 1 (char 2) [decoder.py:raw_decode:400] (JSONDecodeError) 

Given the formatting & merging thoughts mentioned previous in #14 (which I don't follow 100%..), one convenient option would be to have helper functions which can format the .json file into more human-readable form and then minimize them for the commits again (I guess this makes it also related to #44).

An alternative is to make it into valid json, and then editor-plugins/command line tools (like python -m json.tool < sourcedata/studyspec.json) could be used for that. Currently there are multiple root elements, so simply adding putting all into an array [{},..,{}] should be enough.

bpoldrack commented 4 years ago

Re the actual request: Absolutely agree. More tools are needed to assess and access specs.

That it's not an actual single JSON object and therefore can be parsed and evaluated on a "per-snippet" basis is intentional.

One hint for now, though: import datalad.support.json_py gives you load_stream and dump2stream helpers for that format, so you can deal with the entire a thing as a list of dicts within python.

pvavra commented 4 years ago

write two little scripts based on my (limited) understanding of how to handle list of dicts. At least for the default studyspec.json file created by datalad run-procedure cfg_bids, it is reversible: it converts it into valid json and then back again to a proper studyspec format.

bpoldrack commented 4 years ago

@pvavra : Looks right at a quick glance. However, that's mostly what above mentioned load_stream and dump2stream would do. Using those would be somewhat more "future safe", since this is what hirni uses.

One piece is missing, though. Sorting such a spec before writing. This is relevant for several reasons:

  1. Human digestion of git-history. Even little changes in the actual values could lead to huge and hard to read diff as return by git diff if the order of things changes. So, keeping it stable by whatever sorting criteria is beneficial for the user (or your future self). That's not opposing the use of such scripts as they don't destroy that per se. Just something to be aware of when using that intermediate file.
  2. Conversion iterates over that list. Therefore in elaborate cases it might be relevant for the conversion procedures what was done before.
  3. Finally, by now the default is to put the dict with type dicomseries:all before all the dicomseries dicts. Again - functionally relevant only, if one goes quite elaborate with what the specification mechanic allows to do, but something to keep in mind. Unless there's a particular reason to not do it, it might be wise to just stick with that default to minimize future surprise ;-)
pvavra commented 4 years ago

Using those would be somewhat more "future safe", since this is what hirni uses.

That's a good point. Also, the sorting gets handled automatically (and any other "special assumptions", like the dicomseries:all being first in the list can be centralized there).

I've "moved" the updated scripts to a dedicated repo, where I will also be uploading more scripts as needed for our current study.

bpoldrack commented 4 years ago

Nice. That's a good approach, I think. You guys can figure out there what you need and I can eventually integrate here piece by piece as it fits in what I'm working on. Please keep filing issues here and whenever you solve/address something you need in that repo over there - put a reference here, so I can have a closer look at it when I'm working on something similar anyway.