Open adam2392 opened 3 years ago
My take would be a list of tuples:
new_vals = [('row_val_1', 'col_val_1', val_1),
('row_val_n', 'col_val_n', val_n)]
To select all rows or columns, we could use None
:
new_vals = [(None, 'col_val_1', val_1), # all rows
('row_val_n', None, val_n)] # all cols
We need a way to specify which column shall serve as the index, so specifying a row value for indexing actually makes sense.
For example, to select all bad
channels and change their
status_description
, we'd do:
new_vals = [('status', 'bad', 'status_description', 'super bad')]
but one can already see how this is going to be horrible to use.
I believe what we actually want is pandas-style indexing (i.e.,
DataFrame.loc
)
let's not reinvent the wheel here.
if we have a way to read it the events as a dataframe we can let users do what they want and then we offer them a way to write it back doing some schema checks.
my 2c
Totally agree. I was just trying to think within the existing framework of the JSON updates.
But I would very much prefer using pandas here too.
We should not expose these implementation details to the users though, so let's discuss API :) I don't want to end up with one update function for JSON and another for TSV files, that's all
let's not reinvent the wheel here. if we have a way to read it the events as a dataframe we can let users do what they want and then we offer them a way to write it back doing some schema checks. my 2c
Agreed internally should definitely use pandas DataFrame. The API would be just for consistency sake, since I can see that there's a lot of use for updating a list of channels with some updated column value in channels.tsv/electrodes.tsv
.
new_vals = [('row_val_1', 'col_val_1', val_1), ('row_val_n', 'col_val_n', val_n)]
To select all rows or columns, we could use
None
:new_vals = [(None, 'col_val_1', val_1), # all rows ('row_val_n', None, val_n)] # all cols
I like this approach.
We need a way to specify which column shall serve as the index, so specifying a row value for indexing actually makes sense.
I think for files such as channels.tsv/electrodes.tsv
we can safely assume that the index is the name
column? That way if you want to update the status
and status_description
?
new_vals = [(None, 'status', 'bad'), (None, 'status_description', 'super bad')]
can you give me a concrete usage example for an event or channel tsv file?
I'm also not yet totally convinced what exactly we're trying to achieve and how to get there :)
This would be a change to electrodes/channels.tsv
. Say I need to change description
of a channel, or set of channels to resected
, or say left temporal lobe
:
new_vals = [
('C1', 'description', 'resected'),
('C2', 'description', 'resected'),
(C40', 'description', 'left temporal lobe'),
]
# index column is assumed based on the fact that it's a 'channels.tsv' file to be 'name'
update_sidecar_tsv(bids_path, new_vals)
I acknowledge this can potentially be messy down the road, but minimally I feel like it would be useful to have a tsv reader/writer that might abstract away the pandas DataFrame reading?
but minimally I feel like it would be useful to have a tsv reader/writer that might abstract away the pandas DataFrame reading?
I see the biggest advantage in the fact that a BIDSPath
can essentially match more than just one TSV file (an entire study), which could be useful
to have a tsv reader/writer that might abstract away the pandas DataFrame reading?
is it more than 1 line of code to read pandas? We will add a dependency with no benefit. Can't you achieve the same with BIDSPath.match()
and pd.read_csv(...)
?
is it more than 1 line of code to read pandas? We will add a dependency with no benefit. Can't you achieve the same with
BIDSPath.match()
andpd.read_csv(...)
?
Yes, and it'd be more flexible too, I suppose.
But this raises the question why we offer JSON sidecar updating and not TSV sidecar updating?
cause pandas does exist for JSON :)
what you describe is:
df = read_sidecar_tsv(bids_path)
new_vals = [ ('C1', 'description', 'resected'), ('C2', 'description', 'resected'), (C40', 'description', 'left temporal lobe'), ]
for val in new_vals:
df.at[val[0], val[1]] = val[2]
if we stick to such updates then yes maybe pandas is not necessary
and this could be done for all matching files.
cause pandas does exist for JSON :)
Well.
# %%
import json_tricks
import pandas as pd
x = {
'foo': 'bar',
'baz': {
'xxx': 1,
'yyy': False
}
}
x_json = json_tricks.dump(x, '/tmp/foo.json')
df = pd.read_json('/tmp/foo.json')
print(df)
foo baz
xxx bar 1
yyy bar 0
:)
can you try on a json as specified by bids?
more seriously let's stick to TSV here
Describe the problem
With https://github.com/mne-tools/mne-bids/pull/601, an API for "updating" BIDs dataset was introduced in mne-bids. This however, only works for JSON files currently, which are easier because it can be represented as a dictionary
update()
.I have many datasets where additional information is appended though at the channel level, which are encoded in the
channels.tsv
,electrodes.tsv
files. It would be awesome to have a similarly structured API for updating the corresponding.tsv
files inside a BIDS dataset.Describe your solution
I don't have one heh, but was hoping the brilliant minds here would have an idea on how to update a tsv file robustly and efficiently.
Describe possible alternatives
A naive way of solving the above problem would be to just have a nested dictionary as the dictionary update. That is:
an updating dictionary entries for a
channels.tsv
file would look like:Then the corresponding
status_description
andtype
are updated for channel rows "C1" and "C2". Everything else left as is.Additional context