neurolabusc / dcm_qa

DICOM to NIfTI/BIDS validation script
BSD 2-Clause "Simplified" License
11 stars 2 forks source link

Check file integrity, not only filenames? #1

Closed dangom closed 7 years ago

dangom commented 7 years ago

Thanks a lot for this!

I was thinking about a validator today, as I noticed that different versions of dcm2niix applied to the same DICOMs produce different NIfTI files.

One thing that changed, for example, is that a string "imgComment" used to be appended to the NIfTI header and the string dissapeared in recent versions. As a consequence, hashes of NIfTIs converted in earlier versions may not match hashes of NIfTIs converted with HEAD.

I came across this issue while looking at an inconsistency within my BIDS participants.tsv. Trying to solve the problem I was surprised that I couldn't reproduce the conversion of some of the DICOMs. =/

I don't know the cleanest way to add such a check, but maybe something in the lines of:

find $refdir -type f -exec sha256sum {} \; | sort > /tmp/refdir_hashes.txt
find $outdir -type f -exec sha256sum {} \; | sort > /tmp/outdir_hashes.txt
cmp --silent /tmp/refdir_hashes.txt /tmp/outdir_hashes.txt || echo "REGRESSION."

Though this solution wouldn't tell us which file is the offending one.

Now that I mentioned this, are there any discussions within the BIDS community to add a DICOM identifier to the BIDS sidecar? The fact that NIfTI outputs are "sort of" anonymous makes it hard to trace where the files were generated from (or could I be missing something very simple?).

neurolabusc commented 7 years ago

The current implementation is very simple, and reports ANY changes found using DIFF. For example, here what you get from two virtually identical outputs where a different compiler was used to create dcm2niix:

--- /Users/rorden/dcm_qa/Out/sag_int_36sl_21.json   2017-06-21 12:36:48.000000000 -0400
+++ /Users/rorden/dcm_qa/Ref/sag_int_36sl_21.json   2017-06-21 09:11:20.000000000 -0400
@@ -26,5 +26,5 @@
    "TrueEchoSpacing": 0.000559996,
    "PhaseEncodingDirection": "i-",
    "ConversionSoftware": "dcm2niix",
-   "ConversionSoftwareVersion": "v1.0.20170621 GCC6.1.0"
+   "ConversionSoftwareVersion": "v1.0.20170621 Clang7.3.0"
 }

In other words, the current solution is exceptionally verbose, but it does provide a starting point to detect regressions.

If you want to develop a more sophisticated validation scheme (hopefully using a nicer language that shell scripts) you can submit a pull request.

As a few comments