Closed oruebel closed 2 years ago
During discussion of this issue with @mavaylon1 @rly @Michael-Wulf @oruebel we refined the discussion to the following 2 main options:
Task.task_parameter
. This would
have the advantage that the data type of each dataset can vary for each parameter. The
disadvantage is that this potentially creates many small datasets"[name, value, output_type, type]
, which would be under Tasks
. Currently, Beadl doesn't store the actual type, but rather the intended output type
. To help us and to help Beadl have a more structured/rigorous definition of types, they would need to add on the actual value type of the definition. Example: <BeadlArgument name="ValveTime" expression="GetValveTime(RewardSize, ValveTime)" type="numeric" />
. The value would be stored as a string and so the type would be string but the output_type
would be numeric
. Having this distinction would allow us to have a validator to check that the value
can indeed be converted to the intended type. So in this example, even though the output_type
is intended as a numeric
, the validator won't check to see if it's convertible. On the other hand, if the user says the value='a'
, but the type=int
, then the validator would catch that error because 'a'
can't be converted to an int. An idea that may or may not be useful would be to adjust to_dataframe
to convert this parameter to convert/reflect the value as the type.Both options would have the data stored as individual argument columns on the TrialsTable
.
parse the Beadl xml for the argument definitions and store them (specifically the values as strings)
I think the important realization here is that these are indeed not actual values but rather expression that are used during the execution of the task program to then define/create the actual values. As such these expressions can take many forms; from simply constants to complex functions and programs that generate the values. As described above, because of this, the type of the expression is often not the same as the type of the values an expression generates.
The second key part then is that the actual values are stored in the TrialsTable
. For constant expressions this may seem redundant (since this would create columns with constant values), but it is explicit and easy-to-use. Also, a user can still chose which arguments to record in the TrialsTable
, and so it is possible to omit constant columns in the TrialsTabls
and create the value from the definition that is stored in the TaskArgumentTable
instead.
The value would be stored as a string and so the type would be string but the
output_type
would benumeric
.
Based on the realization that these are expressions (rather than actual values), the notion to store these expressions as strings along with information about the type
of the expression and the output_type
of the values seems appropriate. With this in mind, Option 2 seems to be a good option.
Currently, Beadl doesn't store the actual type, but rather the intended
output type
. To help us and to help Beadl have a more structured/rigorous definition of types, they would need to add on the actual value type of the definition.
I agree, that would be a very clear and logical approach.
in a Table with columns
[name, value, output_type, type]
, which would be underTasks
value
, the term definition
or expression
is probably more appropriate. I think either term is fine, and since Beadl seems to already use the term expression
for this, I think we can probably just keep it consistent and use the term expression
. type
to expression_type
to make it explicit that the value refers to the expression
column.neurodata_type
of the table should be something likeTaskArgumentsTable
(or maybe either TaskParametersTable
)output_unit
(stored as a string? In cases where there is no physical unit, the output_unit
could be either an empty string or be set to the same value as output_type
. Both options would have the data stored as individual argument columns on the
TrialsTable
.
unit
attribute on each column, since many arguments will have physical units (e.g., time delays in seconds or reward amount in milliliters or grams)AlignedDynamicTable
to make it easy to distinguish between columns that: i) define arguments of the Trial, ii) outcomes of the trial, and iii) definitions of the trials etc. This may not be strictly necessary, but I wanted to mention it because I think it could be useful. Should the TaskArgumentsTable be a compound Table (i.e., row-based table with fixed columns) or a DynamicTable (i.e., column-based table with support for dynamic addition of columns)? Table
has the advantage that it enforces a strict structure and requires fewer datasets but will require extension if one wants to add columns. DynamicTable
requires a more complex schema and more datasets but will allow users to add columns without requiring extensions. I personally don't have a strong preference for either. The main question I think is how likely we think it is that users will need to store additional columns.
@rly @mavaylon1
@mavaylon1 can this issue be closed?
Motivation
This issue has come up as part of discussions during the NWB User Days 2022 in relation to a use case by @gitelian.
Problem
Behavioral experiments often involve additional parameters, e.g., the size of the reward, the time delays for rewards and other actions etc.. These parameters are often important metadata to facilitate analysis, query, and interpretation of the data. In practice the parameters can be both static (i.e., defined before the experiment) or change dynamically (typically on a per-trial basis).
BEADL
After talking with @Michael-Wulf, and if I understand the BEADL XML correctly, then these parameters are defined in the XML of the task program as part of the
<BeadlArguments>
. The definition of these parameters appear to be defined as strings in the XML file, but I believe they typically take the form of either numeric values, text, or more complex programmatic logic to modify the parameters automatically between trials.https://github.com/rly/ndx-beadl/blob/69fa67f3d0021327b41861359851969be2b4444a/docs/tutorial_nwb_userdays_2022/LightChasingTask.xml#L4-L10
The per-trial values of these arguments is then also recorded in the matlab file.
Suggested Change
It would be useful to support storage of task parameters as part of the
Task
type. The number and name of the parameters will depend on the particular task. We could parse the definition of the parameters from the XML file. I could see a few different options to describe this:TaskParametersTable
with the columnsname
andvalue
. I'm not sure what data type thevalue
needs to be.text should work, but it would be nice if we could also represent
numeric`` parameters and possibly other data types.Task.task_parameter
. This would have the advantage that the data type of each dataset can vary for each parameter. The disadvantage is that this potentially creates many small datasets, but I would assume that the number of parameters should be reasonably small.Task
type. This would also allow us to express arbitrary data types and at the same time avoid creating lots of small datasets. Unfortunately, I think we can not currenly express this with the schema language, since attributes have fixed names and noneurodata_type
(if I'm not mistaken), so we can't have arbitrary user-defined attributesI think either Option 1 or Option 2 would work. Option 1 has the disadvantage that we would be limited to
text
parameters but it would keep things concise in a table. Option 2 on the other hand would allow us to support arbitrary parameters but has the disadvantage that it potentially results in lots of small datasets and the user could store essentially anything they want as a parameter.Trials
table. I.e,. I believe we probably don't need to extend the schema to store those values, but we should updated the parser for the matlab file to add those columns to theTrials
table.