rly / ndx-structured-behavior

An NWB extension for storing structured behavior programs and data, such as from BAABL/BEADL
BSD 3-Clause "New" or "Revised" License
3 stars 3 forks source link

Add support for task parameters / attributes #15

Closed oruebel closed 2 years ago

oruebel commented 2 years ago

Motivation

This issue has come up as part of discussions during the NWB User Days 2022 in relation to a use case by @gitelian.

Problem

Behavioral experiments often involve additional parameters, e.g., the size of the reward, the time delays for rewards and other actions etc.. These parameters are often important metadata to facilitate analysis, query, and interpretation of the data. In practice the parameters can be both static (i.e., defined before the experiment) or change dynamically (typically on a per-trial basis).

BEADL

After talking with @Michael-Wulf, and if I understand the BEADL XML correctly, then these parameters are defined in the XML of the task program as part of the <BeadlArguments>. The definition of these parameters appear to be defined as strings in the XML file, but I believe they typically take the form of either numeric values, text, or more complex programmatic logic to modify the parameters automatically between trials.

https://github.com/rly/ndx-beadl/blob/69fa67f3d0021327b41861359851969be2b4444a/docs/tutorial_nwb_userdays_2022/LightChasingTask.xml#L4-L10

The per-trial values of these arguments is then also recorded in the matlab file.

Suggested Change

  1. It would be useful to support storage of task parameters as part of the Task type. The number and name of the parameters will depend on the particular task. We could parse the definition of the parameters from the XML file. I could see a few different options to describe this:

    1. Option 1 would be to store these parameters as a TaskParametersTable with the columns name and value. I'm not sure what data type the value needs to be. text should work, but it would be nice if we could also representnumeric`` parameters and possibly other data types.
    2. Option 2 would be to store the parameters as datasets in subgroup Task.task_parameter. This would have the advantage that the data type of each dataset can vary for each parameter. The disadvantage is that this potentially creates many small datasets, but I would assume that the number of parameters should be reasonably small.
    3. Option 3 would be store the parameters as attributes of the Task type. This would also allow us to express arbitrary data types and at the same time avoid creating lots of small datasets. Unfortunately, I think we can not currenly express this with the schema language, since attributes have fixed names and no
      neurodata_type (if I'm not mistaken), so we can't have arbitrary user-defined attributes

I think either Option 1 or Option 2 would work. Option 1 has the disadvantage that we would be limited to text parameters but it would keep things concise in a table. Option 2 on the other hand would allow us to support arbitrary parameters but has the disadvantage that it potentially results in lots of small datasets and the user could store essentially anything they want as a parameter.

  1. To store the per-trial values of these parameters, I believe we can just store those as user-defined columns on the Trials table. I.e,. I believe we probably don't need to extend the schema to store those values, but we should updated the parser for the matlab file to add those columns to the Trials table.
oruebel commented 2 years ago

During discussion of this issue with @mavaylon1 @rly @Michael-Wulf @oruebel we refined the discussion to the following 2 main options:

Both options would have the data stored as individual argument columns on the TrialsTable.

oruebel commented 2 years ago

parse the Beadl xml for the argument definitions and store them (specifically the values as strings)

I think the important realization here is that these are indeed not actual values but rather expression that are used during the execution of the task program to then define/create the actual values. As such these expressions can take many forms; from simply constants to complex functions and programs that generate the values. As described above, because of this, the type of the expression is often not the same as the type of the values an expression generates.

The second key part then is that the actual values are stored in the TrialsTable. For constant expressions this may seem redundant (since this would create columns with constant values), but it is explicit and easy-to-use. Also, a user can still chose which arguments to record in the TrialsTable, and so it is possible to omit constant columns in the TrialsTabls and create the value from the definition that is stored in the TaskArgumentTable instead.

The value would be stored as a string and so the type would be string but the output_type would be numeric.

Based on the realization that these are expressions (rather than actual values), the notion to store these expressions as strings along with information about the type of the expression and the output_type of the values seems appropriate. With this in mind, Option 2 seems to be a good option.

Currently, Beadl doesn't store the actual type, but rather the intended output type. To help us and to help Beadl have a more structured/rigorous definition of types, they would need to add on the actual value type of the definition.

I agree, that would be a very clear and logical approach.

in a Table with columns [name, value, output_type, type], which would be under Tasks

Both options would have the data stored as individual argument columns on the TrialsTable.

oruebel commented 2 years ago

Should the TaskArgumentsTable be a compound Table (i.e., row-based table with fixed columns) or a DynamicTable (i.e., column-based table with support for dynamic addition of columns)? Table has the advantage that it enforces a strict structure and requires fewer datasets but will require extension if one wants to add columns. DynamicTable requires a more complex schema and more datasets but will allow users to add columns without requiring extensions. I personally don't have a strong preference for either. The main question I think is how likely we think it is that users will need to store additional columns.

@rly @mavaylon1

oruebel commented 2 years ago

@mavaylon1 can this issue be closed?