Open pvavra opened 3 years ago
You can overwrite. This requires a few things:
datalad hirni-spec4anything
with --replace
type
, location
and id
need to matchThe problem is that hirni-spec4anything
is intended to not create an entire spec, but a single entry (not exactly like dicom2spec
). So, if you have a specfile with multiple entries already, this commands needs to somehow figure out when to append with a new entry and when to overwrite (and which one to overwrite).
I guess that a simple search by {{location}} should be enough to identify whether an entry already exists in the specs or not, but am note sure about that..
I don't think location
is sufficient. You may want to feed one location into multiple conversion routines:
location
could be a directory that is the expected input parameter for existing conversion routines but actually contains several
things that are to be converted by different (but similarly working) convertersFor the same reasons adding type
might be insufficient. So I went with id
in addition. Note, that in principle you are free to choose id
values for your entries. Filling it with the SeriesNumber of DICOM image series via the default rules is just that: a default.
An alternative could be to change the --replace
switch to take a value that indicates how to identify the to be overwritten entries, like:
datalad hirni-spec4anything ... --replace "field, names, tomatch"
.
What do you think, @pvavra, would something like this make sense to you (and be useful)?
Edit:
In addition, with such a change the default value for that replace
could be the current type
, location
and id
, so that by default spec4anything
would overwrite whenever those three match an existing entry.
I missed the flag --replace
. I'll look into whether it solved the issue for me.
However, I don't think the current (implicit) design is a good idea, despite seeing where it's coming from. Specifically, for the following reason:
In the current version, hirni does not seem to have an intended workflow for sessions, where no imaging data had been collected. This might, for example, happen in multi-session studies where only on some days MRI data are acquired, but additional behavioral data gets collected on other days (I'm actively involved in at least one such project atm..).
Also for those kind of projects, I would want to first dump all the raw data into sourcedata
and keep each acquisition (session) independent of the other ones by using a new acquisition
directory - as a user, this is the way I organize my thinking about the data, namely chronologically based on acquisition sessions.
However, since no dicom-import
can be used to create the skeleton specfile, which would be amended by spec4anything
, this does not work (well), as far as I can tell, for sessions without MRI data. As a workaround, we have previously simply added the behavioral files (in a zipfile) into a later MRI-acquisition session (in addition to that session's behavior logfiles). Then our converter was able to tell based on the zipfile (name or its content) which session the data actually came from and reformat it according to bids.
But this goes against the "add each acquisition/session into one folder" mentality, which I implicitly assumed when populating sourcedata. My hunch is that session
is also the dominant organizational level other researchers tend to think about their (raw) data. Indeed, for the sessions with dicoms, the implied workflow is to dump all non-dicom files inside the acquisition folder, parallel to the dicoms and use one specfile for everything.
Now, I understand that one file/location might need to be processed by multiple converters. However, from what I can see, no such default converters exist currently in hirni. That is, none are able to selectively process a portion of a location
(but I haven't looked closely) except for the obvious dicoms-converter.. For all custom converters I write, I can simply create a wrapper which I hook up into the specfile. That wrapper, then, could simply call the different converters I need in turn.
[In a sense, this is what I already do for the behavioral-logfile converter for my other 2 session project: I have a zipfile for both sessions, and my script checks for it's content and triggers one or the other converter of the matlab logfiles, depending what content the zipfile has (e.g. 10 runs on day1, but only 4 on day2). but I call only one converter from the specs.]
Or have I misunderstood something fundamentally here?
Related to the intended workflow: is anyone actually using the spec4anything
with multiple converters for one location?
In any case, since the import-dicom
call does replace all specs, maybe spec4anything
should also default to actively replacing, and add a flag for not replacing instead..
However, since no dicom-import can be used to create the skeleton specfile, which would be amended by spec4anything, this does not work (well), as far as I can tell, for sessions without MRI data.
It is not clear to me, what isn't working (well) for you in such a case. Trying with just some dummy doesn't show any issue to me:
(hirni-dev) ben@tree in ~
❱ datalad create -c hirni some
[INFO ] Creating a new annex repo at /home/ben/some
[INFO ] Scanning for unlocked files (this may take some time)
[INFO ] Running procedure cfg_hirni
[INFO ] == Command start (output follows) =====
[WARNING] Found no aggregated metadata info file /home/ben/some/.datalad/metadata/aggregate_v1.json. You will likely need to either update the dataset from its original location or reaggregate metadata locally.
[WARNING] Dataset at . contains no aggregated metadata on this path [metadata(/home/ben/some)]
[WARNING] Could not obtain dataset metadata, proceeding without
[INFO ] Scanning for unlocked files (this may take some time)
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
[INFO ] == Command exit (modification check follows) =====
Created dataset at /home/ben/some.
datalad create -c hirni some 3.89s user 2.41s system 68% cpu 9.243 total
(hirni-dev) ben@tree in ~
❱ cd some
(hirni-dev) ben@tree in ~/some on git:master
❱ ll
total 40
drwxr-xr-x 5 ben ben 4096 Mar 17 10:36 .
drwxr-xr-x 55 ben ben 4096 Mar 17 10:36 ..
drwxr-xr-x 3 ben ben 4096 Mar 17 10:36 code
drwxr-xr-x 2 ben ben 4096 Mar 17 10:36 .datalad
-rw-r--r-- 1 ben ben 225 Mar 17 10:36 dataset_description.json
drwxr-xr-x 9 ben ben 4096 Mar 17 10:36 .git
-rw-r--r-- 1 ben ben 242 Mar 17 10:36 .gitattributes
-rw-r--r-- 1 ben ben 249 Mar 17 10:36 .gitmodules
-rw-r--r-- 1 ben ben 249 Mar 17 10:36 README
-rw-r--r-- 1 ben ben 830 Mar 17 10:36 studyspec.json
(hirni-dev) ben@tree in ~/some on git:master
❱ mkdir no-dicom-acq
(hirni-dev) ben@tree in ~/some on git:master
❱ touch no-dicom-acq/somefile
(hirni-dev) ben@tree in ~/some on git:master
❱ datalad hirni-spec4anything --properties '{"type": "behavioral", "id": 1}' no-dicom-acq/somefile
(hirni-dev) ben@tree in ~/some on git:master
❱cd no-dicom-acq
(hirni-dev) ben@tree in ~/some/no-dicom-acq on git:master
❱ ll
total 12
drwxr-xr-x 2 ben ben 4096 Mar 17 10:38 .
drwxr-xr-x 6 ben ben 4096 Mar 17 10:37 ..
-rw-r--r-- 1 ben ben 0 Mar 17 10:37 somefile
-rw-r--r-- 1 ben ben 277 Mar 17 10:38 studyspec.json
(hirni-dev) ben@tree in ~/some/no-dicom-acq on git:master
❱ cat studyspec.json
{"comment":{"approved":false,"value":""},"dataset-id":"21be1eef-79cb-4314-ad76-4da505049c81","dataset-refcommit":"b69e6edf0a13693697a15fba26d1f91d69095d52","id":{"approved":true,"value":1},"location":"somefile","procedures":{"approved":false,"value":null},"type":"behavioral"}
(hirni-dev) ben@tree in ~/some/no-dicom-acq on git:master
❱
In any case, since the import-dicom call does replace all specs, maybe spec4anything should also default to actively replacing, and add a flag for not replacing instead..
Need to double-check, but to me that sounds more like import-dicom
shouldn't replace everything. After all, one could have valid specs for other data modalities in the acquisition already, and import-dicom
shouldn't kill them.
Also for those kind of projects, I would want to [...] keep each acquisition (session) independent of the other ones by using a new acquisition directory - as a user, this is the way I organize my thinking about the data, namely chronologically based on acquisition sessions.
Agree - this is the intended way of thinking about it.
Overall it seems to me, that the trouble arises from the (perceived?) need to create a skeleton specfile. There may well be issues with spec4anything
and/or import-dicom
/dicom2spec
, but it's not quite clear to me yet. There should be no need for some "skeleton". A specfile is just a collection of dicts. There's no need for it to contain some skeleton pieces.
Hence, understanding your trouble with non-dicom acquisition seems key to me.
Ok, I see your "skeleton spec" point. That was a misconception on my side.
I guess the issue I have stems from the folloing: dicom-tarfiles and any other raw files are treated by hirni completely differently.
imported
, where their content is unpacked, processed and added as a subdataset. Importantly, adding the dict of the events file inherits some parts (e.g. subject) from the preceding entries in the specs.
However, no cli arguments of spec4anything
exist to add those for a "standalone" file. I know that I can forward it with the --properties
argument, but I already use that to specify the behavioral converter. From what I understood, I cannot specify multiple jsons/dicts via that argument (i.e. one constant for all subjects, one mutable containing the subject id). I would need to stitch that beforehand together.
Now an anything-import function analog to the import-dicoms one could:
--properties
field And returning to the original issue of replacing entries
import-dicom
(i.e. in a smart way).Then spec4anthing
could do what it is doing, i.e. not replacing the spec-entrys, but appending them. I see the potential use for adding multiple entries for one folder, even though I personally would not use it.
Note that in my case, there could be a simple check whether the proposed new entry matches any existing one (the lines are identical in my case, I think, or at least in any deterministic part they are) and simply not add it then.
Ok - I see your convenience points and the relation to what import-dicoms
does. I would probably go for an additional command (like import-anything
) instead of fundamentally changing spec4anything
, but need to think that through.
From what I understood, I cannot specify multiple jsons/dicts via that argument (i.e. one constant for all subjects, one mutable containing the subject id). I would need to stitch that beforehand together.
Yes, you would need to put the entire JSON dict together instead of specifying single properties separatly. I agree that this is not particularly convenient via command line (although/because it is more flexible).
Note, however, that you can generate that dict by any means you have and write it to file to be passed to --properties
, which might be a lot more readable then explicitly passing a JSON structure with all the required quoting via command line.
Also: You don't need to go through spec4anything
. You can use anything you want to create/modify such a JSON dict and append it to the specfile. For example I tend to use python scripts for editing existing specs:
Here I want to edit specfiles in order to not convert image series that aren't motion corrected (for studyforrest):
$ cat ignore_non_moco.py
#!/usr/bin/env python
from datalad.support import json_py
import posixpath
from datalad_hirni.support.spec_helpers import (
has_specval,
get_specval
)
import sys
def mark_ignored(path):
spec = [r for r in json_py.load_stream(path)] if posixpath.exists(path) else list()
# ignore non-MoCo series (except angiography => ToF):
for snippet in spec:
if snippet["type"] == "dicomseries" and \
has_specval(snippet, 'description') and \
"MoCoSeries" not in get_specval(snippet, 'description') and \
"ToF" not in get_specval(snippet, 'description'):
print("ignore %s" % get_specval(snippet, 'description'))
if 'tags' in snippet.keys():
snippet['tags'].append('hirni-dicom-converter-ignore')
else:
snippet['tags'] = ['hirni-dicom-converter-ignore']
json_py.dump2stream(spec, path)
if __name__ == '__main__':
mark_ignored(sys.argv[1])
This is then called from within a shell script (that does more than that, of course) running in the study dataset:
for spec in */studyspec.json; do
ignore_non_moco.py ${spec}
done
datalad save */studyspec.json -m "Don't convert non-motion corrected dicomseries and aborted ones"
So, if you use spec4anything
just to create base snippets specifying location
, type
, may be an id
, that kind of further editing may be easier to write, read and change.
However, as I said above: I see the point and I'll have a closer look on how to change or add a command that covers what you outlined.
As intended, calling
datalad hirni-spec4anything
to add a new file (behavioral logfile) to an existing spec creates an new entry of typeevents_file
.However, simply repeating the call again, creates another entry with the same values below, creating a duplicate.
Instead, it should overwrite any settings, like a repeated call to
dicom2specs
does.I guess that a simple search by
{{location}}
should be enough to identify whether an entry already exists in the specs or not, but am note sure about that..