spine-generic / data-multi-subject

Multi-subject data for the Spine Generic project
Creative Commons Attribution 4.0 International
22 stars 15 forks source link

Lint line endings #87

Closed kousu closed 3 years ago

kousu commented 3 years ago

This standardizes us on unix line endings (\n aka 0x0a). It prevents https://github.com/spine-generic/data-multi-subject/pull/57#issuecomment-703746602.

Here's an example of how this looks when it triggers a warning: https://github.com/spine-generic/data-multi-subject/pull/87/checks?check_run_id=2374745131#step:4:69

Screenshot from 2021-04-18 11-21-07

Of course, to make this work, I had to edit the current state of some of the the metadata. It looks like the balgrist parts were edited on Windows, and most lines in participants.tsv too.

(citing http://dataprotocols.org/linear-tsv/

It is permitted but discouraged to separate records with carriage-return-newline (0x0d and 0x0a)

so I just got rid of the 0x0ds)

kousu commented 3 years ago

What is the diff in all the JSON files? is it motivated by a BIDS-related rule? If there are issues in the way JSON files are produced by dcm2niix then we should probably raise an issue on their repos (otherwise this issue will keep happening)

Well that's why I guessed

It looks like the balgrist parts were edited on Windows,

I don't really know why they're like that. Most of the json files were unix-ended. It was just that set that I needed to bring into compliance.

It looks like all of them were added by you, in f246510c4e4f432f195709e1130ee417606eba44:

$ git log origin/master -- sub-balgrist*/**/*.json
commit f246510c4e4f432f195709e1130ee417606eba44
Author: Julien Cohen-Adad <jcohen@polymtl.ca>
Date:   Tue Jun 30 16:31:02 2020 -0400

    Added sub-amu -> sub-barcelona

So... I don't know. It was probably a fluke. Lots of other files that you also edited were unix-ended just fine.

The linter will catch it if it happens again!

kousu commented 3 years ago

You can run this to check that the diff made no change to the data:

(find . -name "*.json" |
  while read json; do
    if ! diff -u <(git show origin/master:$json | python -m json.tool) <(git show ng/lint-newlines:$json | python -m json.tool); then
      echo $json;
    fi;
  done) | less

it parses all the JSON through python and pretty-prints it back out again (which is another method of canonicalization) for both branches, and then diffs it. It came out empty for me, so I think it's good.

kousu commented 3 years ago

It doesn't seem like BIDS specifies line endings; the most it says is use UTF-8 JSON. Which is too bad, because it means the format isn't canonical. It doesn't impose ordering on the keys, or pretty-printing or conversely minifying.

But I'm saying we should specify it.