juliencarponcy / trialexp

MIT License
2 stars 4 forks source link

compute_success and condition_list in the new pipeline #25

Closed kouichi-c-nakamura closed 1 year ago

kouichi-c-nakamura commented 1 year ago

Description

We need clearer definitions of the trial types as well as success for the analyses.

Currently, much of the pipeline relies on compute_success()

reach_time_before_reward = self.df_events.loc[:,['spout_trial_time','US_end_timer_trial_time']].apply(
        lambda x: find_last_time_before_list(x['spout_trial_time'], x['US_end_timer_trial_time']), axis=1)    

However, the definition of success there is if water (US_end_timer_trial_time) is preceded by spout (spout_trial_time). Successes include water_by_bar_off and button_press. We can argue about this, but for most of the analysis, this is probably not good enough.

To compensate for this, at the time of analysis, we defined a list of dictionaries to be more specific in the old pipeline.

# Defime each trial type as a dictionary of conditions to be met

conditions_dict1 = {'trigger': 'hold_for_water', 'success': True, 'button_press': False, 'break_after_abort': False, 'water by spout': True}
conditions_dict2 = {'trigger': 'hold_for_water', 'success': False, 'button_press': False, 'break_after_abort': False}
conditions_dict3 = {'trigger': 'hold_for_water', 'success': False, 'button_press': False, 'break_after_abort': True}
# conditions_dict2 = {'trigger': 'hold_for_water', 'spout':False, 'valid': True, 'busy_win_timer': False, 'button_press': False}

# Aggregate all condition dictionaries in a list
condition_list = [conditions_dict1, conditions_dict2, conditions_dict3]

# Aliases for conditions
cond_aliases = ['hit','miss','abort']

How they are treated: https://github.com/juliencarponcy/trialexp/blob/d42df8479ff7e4544c4e75a9d799198a61c4648a/trialexp/process/data_import.py#L1130

How about this Dataframe?

On the other hand, workflow\scripts\01_process_pycontrol.py (dev branch) creates and save df_condtions.pkl, which looks like this:

2023-05-02 17 34 26

df_condtions.pkl has the following keys/columns:

What we need

I think we should be able to define different types of categories. Each with a timestamp of trial onset.

  1. Success (water_by_spout)
    • water_by_spout should be good
    • You can also check if water is delivered after spout as in compute_success(), but this is not good enough
  2. water_by_bar_off (water_by_bar_off)
  3. water_by_button (button_press )
    • Need to be careful with this one. If a button_press happened after trial onset before busy_window, that should be counted as water_by_button or something
  4. abort ( break_after_abort)
  5. Miss (none of above)

They can be used as a Coordinate in xArray

Questions to you!

@juliencarponcy @teristam @Rashasoliman80

  1. Can we agree on the 5 categories of the trials above? Can you think of more? Or are these too many???
  2. At what level should this be covered??? This probably takes task-specific code and Snakemake pipeline is meant to be more generic. So, it's a bit difficult to find the best place.
    1. Do this outside of the pipeline (the same idea as the old pipeline)? --- can be error-prone
    2. Do this by replacing compute_success() in the pipeline? --- This can break many things
    3. Embed the definitions somehow in a CSV file? --- Probably we need Python code
juliencarponcy commented 1 year ago

At the architecture level, I believe that it should be flexible and based on: https://github.com/juliencarponcy/trialexp/blob/master/params/tasks_params.csv

This to allow any arbitrary number of arbitrary combination of the declared events or condition for a task.

This is more in: https://github.com/juliencarponcy/trialexp/blob/d42df8479ff7e4544c4e75a9d799198a61c4648a/trialexp/process/data_import.py#L1239-L1313 Trial type computation (arbitrary like: hit abort miss) could possibly be added on the already computed xarray since all events and conditions are already presents there. Ths would create a new coordinate ("trial_type"?) which would be a categorical string variable, and of the same length as the trial_nb coordinates.

The above cited function could possibly be adapted to the xarray.

Me to write (chatgpt helped) documentation to help

I personally think that 3 trial types to start with in our current analysis is enough, as I discussed with @kouichi-c-nakamura.

In the category you describe:

Same for

Maybe so that function should take The following and consider adding 2 more categories trial_tyope like button_pressed and hit_bar_off ?


# Defime each trial type as a dictionary of conditions to be met

conditions_dict1 = {'trigger': 'hold_for_water', 'success': True, 'button_press': False, 'break_after_abort': False, 'water by spout': True}
conditions_dict2 = {'trigger': 'hold_for_water', 'success': False, 'button_press': False, 'break_after_abort': False}
conditions_dict3 = {'trigger': 'hold_for_water', 'success': False, 'button_press': False, 'break_after_abort': True}
# conditions_dict2 = {'trigger': 'hold_for_water', 'spout':False, 'valid': True, 'busy_win_timer': False, 'button_press': False}

# Aggregate all condition dictionaries in a list
condition_list = [conditions_dict1, conditions_dict2, conditions_dict3]

# Aliases for conditions
cond_aliases = ['hit','miss','abort']

Displaying hit, abort and miss as selectively defined above should be enough for plot clarity?

Open to all the above. I hope that could be of help.

kouichi-c-nakamura commented 1 year ago

Thanks, I'm kind of in the surgery, so I can't answer in detail, but for now I still maintain that the 3 categories of trials you mentioned (hit, miss, abort) are not good enough to compute success. My concern is that including button press and bar_off in success can mislead ourselves. If you can assure me and others that it's not the case (i.e. we don't have to worry about that), that would be great. Right now I don't fully understand how comute_success() affects other things in the pipeline.

teristam commented 1 year ago

At the architecture level, I believe that it should be flexible and based on: https://github.com/juliencarponcy/trialexp/blob/master/params/tasks_params.csv

This to allow any arbitrary number of arbitrary combination of the declared events or condition for a task.

This is more in:

https://github.com/juliencarponcy/trialexp/blob/d42df8479ff7e4544c4e75a9d799198a61c4648a/trialexp/process/data_import.py#L1239-L1313

Trial type computation (arbitrary like: hit abort miss) could possibly be added on the already computed xarray since all events and conditions are already presents there. Ths would create a new coordinate ("trial_type"?) which would be a categorical string variable, and of the same length as the trial_nb coordinates. The above cited function could possibly be adapted to the xarray.

Me to write (chatgpt helped) documentation to help

I personally think that 3 trial types to start with in our current analysis is enough, as I discussed with @kouichi-c-nakamura.

In the category you describe:

  • 2 by bar_off could be either hit, aborted, or a miss so ambiguous or need to be even more specified and hence too many trial types and too few trials

Same for

  • 3 button press which could be also hit miss etc.

Maybe so that function should take The following and consider adding 2 more categories trial_tyope like button_pressed and hit_bar_off ?


# Defime each trial type as a dictionary of conditions to be met

conditions_dict1 = {'trigger': 'hold_for_water', 'success': True, 'button_press': False, 'break_after_abort': False, 'water by spout': True}
conditions_dict2 = {'trigger': 'hold_for_water', 'success': False, 'button_press': False, 'break_after_abort': False}
conditions_dict3 = {'trigger': 'hold_for_water', 'success': False, 'button_press': False, 'break_after_abort': True}
# conditions_dict2 = {'trigger': 'hold_for_water', 'spout':False, 'valid': True, 'busy_win_timer': False, 'button_press': False}

# Aggregate all condition dictionaries in a list
condition_list = [conditions_dict1, conditions_dict2, conditions_dict3]

# Aliases for conditions
cond_aliases = ['hit','miss','abort']

Displaying hit, abort and miss as selectively defined above should be enough for plot clarity?

Open to all the above. I hope that could be of help.

Yes, all the conditions are already in the xarray. We just need a way to map those conditions to a string representing trial outcome. Perhaps we can organize the trial outcome definition in a pandas dataframe/csv and store it somewhere else? I doubt hard-coding them will be a good solution because those definitions will depend on the task the animal is doing (e.g. in some task there is no hold_for_water event)

juliencarponcy commented 1 year ago

Thanks, I'm kind of in the surgery, so I can't answer in detail, but for now I still maintain that the 3 categories of trials you mentioned (hit, miss, abort) are not good enough to compute success. My concern is that including button press and bar_off in success can mislead ourselves. If you can assure me and others that it's not the case (i.e. we don't have to worry about that), that would be great. Right now I don't fully understand how comute_success() affects other things in the pipeline.

I think you are mislead a bit by the distinction between hit (arbitrary set of conditions) and success defined by compute_success(): compute_success() is just computing whether the spout was touched before the reward was delivered, but then my dictionary precise the 'hit' category as:

hit = _should include only successful trials where button_press was not pushed and the reward was delivered by 'water_byspout'

So specifically, compute_success disregard any complex logic and just say if the spout was touched prior to reward (for the ensemble of tasks where events and triggers are similar enough [see compute_success]) another important characteristic of compute_success() is that is a time-based logic.

On the other hand, a dictionary defining a trial type specify non-time-based logic and allow to further refine trial type as hit = (success Union water_by_bar_off Union non-button-pressed)

Don't know if this helps

kouichi-c-nakamura commented 1 year ago

Thanks, @juliencarponcy . I see that hit and success are defined differently in different places. They are different but similar to some extent. It sounds a bit like "double standard" to me. Couldn't this cause a problem? Like the trial numbers do not add up? Or put it another way, why do we need two different rules?

kouichi-c-nakamura commented 1 year ago

I forgot to include free_reward in the description above.

  1. Success (water_by_spout)
    • water_by_spout should be good
    • You can also check if water is delivered after spout as in compute_success(), but this is not good enough
  2. water_by_bar_off (water_by_bar_off)
    • controlled by v.consec_fail_for_bar_off_water
  3. free_reward
    • controlled by v.consec_fail_for_free_water
  4. water_by_button (button_press )
    • Need to be careful with this one. If a button_press happened after trial onset before busy_window, that should be counted as water_by_button or something
  5. abort ( break_after_abort)
  6. no reaching (none of above)
  7. human intervention
    • Cannot determine with pycontrol data
    • Needs DLC or red channel analysis

@teristam

teristam commented 1 year ago

I have implemented this trial_outcome separation in my branch. Now working on the plotting part. However, one thing I noticed during the implementation is that all these conditions must be mutually exclusive (there can be only 1 outcome type per trial). And in the case where there can be more 1 type in a trial (e.g. water by spout and free water), we must determine their relative priority.

There is also one more outcome: late_reach, the animal did not reach out during the reward window but only during the inter-trial interval.

teristam commented 1 year ago

Fixed in #45