neurogears / vestibular-vr

Closed-loop VR setup for Rancz Lab
2 stars 0 forks source link

H2 crash during recording #63

Closed ederancz closed 4 months ago

ederancz commented 4 months ago

On every block using the main workflow, H2 seems to crash after ~50 seconds. The minimal proportional control workflow does not have this issue. Below are a few notes from the logs.

H1 logs for appropriate time (block length) + whenever logging was started and the workflow stopped H2 logs analogue input (39) and encoder (38) for only ~50 seconds (until the motor stops) but 42 (pulse interval) is logged for the whole block The platform can be rotated after this by hand (i.e. the motor is not clamped). Closed loop visual stimulation keeps running for the block time. There is an error message in the console (see 2 separate examples below).

Not able to parse a Harp Data Frame (03:17:59 PM)!
Raw Harp Data Frame: 03:0C:27:FF:92:29:55:19:00:87:4F:20:4E:04
Not able to parse a Harp Data Frame (03:23:38 PM)!
Raw Harp Data Frame: 03:0C:27: FF:92:20:4E:8E:02:0C:2A: FF:92:90

Note: Possibly cosmetic, H2 does not show up in console when starting Bonsai. H1 is fine.

RoboDoig commented 4 months ago

Just to confirm, is this happening when run from the 'main' branch in the git repo? Or from a different feature branch? Also is this a full crash (i.e. the whole workflow stops running)?

From the error message it sounds like there is maybe a mistake in one of the HARP messsage filters which causes Bonsai to try and parse a data payload of the wrong type, I will investigate.

EleonoraAmbrad commented 4 months ago

we were running the workflow from my branch (Nora-dev). You have a point, I will try to run it from Main branch and see. No, it's not a full crash, it's more like the rest of the streams are logged and the workflow keeps going, but the motor does not.

EleonoraAmbrad commented 4 months ago

I tested the workflow again today using the Main experiment workflow in the screen sync quad branch. I got the same error in the console as mentioned above. After the error, the block logic keeps going, but the VR is not updated. Basically, the block after the DrumClosedLoop will start, but neither the halt protocol nor the flowYtovisualgain are applied. this is the yml file I used for testing this behavior.

RoboDoig commented 4 months ago

I haven't been able to reproduce this bug yet as far as I know, at least I don't see those error messages in the console. I think H2/H1 not showing up in the console is cosmetic as they appear to be producing events and receiving commands correctly. Will continue testing on other branches.

EleonoraAmbrad commented 4 months ago

update: the motor stops being controlled by H2 even if the error message does not appear

RoboDoig commented 4 months ago

A couple of initial notes from testing this:

RoboDoig commented 4 months ago

OK I think I've found the culprit for this, the basic reason is trying to write commands to the H2 device too fast causing a firmware crash.

The source of this is the CalculateMotorProcessing workflow. We are taking separate sources (flow and playback) to drive the motor and using a CombineLatest to add them all up to calculate the desired motor movement delta. Because these sources can operate at different sampling rates, sometimes the CombineLatest will experience samples from flow and playback in very quick succession. Even if the gain of one of these is at 0 the CombineLatest will still fire. In some cases this will cause Bonsai to attempt to write to H2 in short bursts of >1000Hz which can overload the device firmware.

I'll implement a different solution for calculating the motor procession which should fix this, though we do need to test to double check.

Finally, I think the reason the motor can stop even without the error message is that if we crash the firmware there is no guarantee that the H2 device will send us an error - in some cases it may have time to send the response and in others it completely crashes before sending the error message out.

RoboDoig commented 4 months ago

Should be fixed by https://github.com/neurogears/vestibular-vr/pull/64, pending testing.

RoboDoig commented 4 months ago

Confirmed fixed with testing.