sgoldenlab / simba

SimBA (Simple Behavioral Analysis), a pipeline and GUI for developing supervised behavioral classifiers
https://simba-uw-tf-dev.readthedocs.io/
GNU General Public License v3.0
286 stars 139 forks source link

Fix for "Scaled_movement_M1_M2" errors in single animal projects #73

Open seanpaulbradley opened 3 years ago

seanpaulbradley commented 3 years ago

I'm putting this here as an issue rather than a pull request because I know at least one lab member has a part of this fix on the dev wheel.

There are three instances I can spot where a single animal project would run into problems due to the use of . I didn't look exhaustively so it might pop up elsewhere.

The first and most important is in . Only 16 and 14 bp projects - which both assume two animals - are given size-scaled movement values. A fix at line 130 would be something like:

            mousesize = (statistics.mean(outputDf['Mouse_nose_to_tail']))
            mouse1Max = mousesize * 8
            outputDf['Scaled_movement'] = (outputDf['Total_movement_all_bodyparts'] / (mouse1Max)).round(decimals=2)

The solution becomes more elegant if you also replace line 129 so outputs from single and multi animal projects are the same and can be more simply called elsewhere: outputDf['Scaled_movement'] = outputDf['Scaled_movement_M1_M2'].round(decimals=2) so you only have 'Scaled_movement' to work with, but how that column is built in the machine output depends on the BP schema. The downside is you have to chase down all the and change them to .

More easily solved are having the processes that use . I spotted it at process_severity ("Analyze Severity" in the GUI) and in path_plot. I know the dev wheel has a fix (although it is currently right now, and I've called it simply

These could be fixed by having be output the same for both multianimal and single animal projects (as above)

So, if you want to keep the single and multianimal output separate:

            csv_df_combined[severityTarget] = csv_df[severityTarget].values
            if noAnimals == 2:
                csv_df_combined['Scaled_movement_M1_M2'] = csv_df['Scaled_movement_M1_M2'].values
            else:
                csv_df_combined['Scaled_movement'] = csv_df['Scaled_movement'].values
            columnNames = list(csv_df_combined)

If you want to merge them to make handling these data cleaner, just omit the conditional and use .

Please do check me for mistakes.

sgoldenlab commented 3 years ago

Thanks @NIHRBC - very helpful. I spotted the issue and updated the pypi packages in path_plot.py and rf_model_run.py files.

The issue is that the "severity", as we designed it, was a way of checking how much animals are moving during interactions, and relies, as designed, on the user employing one of the built in body-part configurations. As you say though, there should nothing stopping users having one animal to employ it though (this is what I have fixed) - although the way it is coded it would be too much work for now to get it going with user defined body part configurations.

I have to ensure that the user has one of the built in body-part configurations before calculating either the scaled movement for two animals, or one animal, not elegant I admit but my hope is that it works:

if pose_estimation_body_parts == '4' or '7' or '8' or '7':
    mouse1size = (statistics.mean(outputDf['Mouse_1_nose_to_tail']))
    mouse1Max = mouse1size * 8
    outputDf['Scaled_movement_M1'] = (outputDf['Total_movement_all_bodyparts_M1'] / (mouse1Max))
if pose_estimation_body_parts == '16' or pose_estimation_body_parts == '14':
   mouse1size = (statistics.mean(outputDf['Mouse_1_nose_to_tail']))
   mouse2size = (statistics.mean(outputDf['Mouse_2_nose_to_tail']))
   mouse1Max = mouse1size * 8
   mouse2Max = mouse2size * 8
   outputDf['Scaled_movement_M1'] = (outputDf['Total_movement_all_bodyparts_M1'] / (mouse1Max))
   outputDf['Scaled_movement_M2'] = (outputDf['Total_movement_all_bodyparts_M2'] / (mouse2Max))
   outputDf['Scaled_movement_M1_M2'] = (outputDf['Scaled_movement_M1'] + outputDf['Scaled_movement_M2']) / 2
   .....
seanpaulbradley commented 3 years ago

A few issues (again, not putting up a pull because it is on -dev):

In run_RF_model.py, line 136, edit to mouse1size = (statistics.mean(outputDf['Mouse_nose_to_tail'])) to match extract_features_*bp.py for single-animal projects.

In path_plot.py, in the chunk beginning at line 140,

1) Since you won't have attack values for a single animal project, I'd nest that under the conditional for noAnimals = 2

2) For the noAnimals ==1: condition, we need to define the parts we need from m1tuple, i.e.

  if noAnimals == 1:
                    midpoints = list(zip(np.linspace(m1tuple[0], 3), np.linspace(m1tuple[1], 3)))
                    locationEventX, locationEventY = midpoints[1]

Just from what I've seen, I'm fully agreed that trying to get this to work for custom configurations would be a drag.

What I might do for my own purposes is try to plug in a method for plotting multiple classifications directly rather than severity by printing on different color overlays to represent different behaviors. I'm not sure when (or if) I'll have the time to do it, but I'll share it with you when I'm done if you like.