pez-globo / pufferfish-software

All software for the Pufferfish ventilator.
Apache License 2.0
0 stars 1 forks source link

When frontend refreshes from the quickstart screen, it fails to reestablish its connection #386

Closed ethanjli closed 3 years ago

ethanjli commented 3 years ago

When the system is in ventilating mode and the frontend closes and is reopened on the landing page (which is where systemd should reopen the frontend if the frontend gets killed from freezing or dies from crashing), the frontend will automatically take the user to the ventilating dashboard. This is desirable.

When the frontend closes and is reopened (or is merely refreshed, like what yarn start will do when a file in the frontend is changed) on the quickstart screen, regardless of whether the system is in ventilating mode or standby mode, the frontend will fail to connect to the backend, and the backend does not receive any messages from the frontend (so it wants to kill the frontend). In the chrome devtools console, I see a protobuf error message that the remaining field of AlarmMuteRequest is undefined and thus cannot be decoded, and this unhandled exception seems to break the frontend's saga for talking to the backend. I saw this problem previously in https://github.com/pez-globo/pufferfish-software/pull/370#issuecomment-857274561 and I thought I had fixed it in #377, but apparently that was only a partial fix. When I check the redux console, I see that store.controller.alarmMute.request only has active defined, while remaining is not defined. I'm not sure why this issue only appears when the frontend is opened from the quickstart screen, rather than from the landing page.

This behavior can be observed on commit 2953546, and I don't remember seeing this behavior in the past; however, I have not tried running previous commits to see where the regression came from (though I strongly suspect it's from #370). This issue doesn't break the frontend for deployment, but to me it suggests some deeper problem with the frontend's and/or backend's initialization behavior for AlarmMuteRequest.

ethanjli commented 3 years ago

@rohanpurohit Can you see if you can reproduce what I've observed? If so, we'll need to figure out whether something is missing in the backend for initializing AlarmMuteRequest in the frontend, or whether something is missing in the frontend for initialization of AlarmMuteRequest - e.g. if the frontend is overwriting the store's copy of AlarmMuteRequest with {active: true} or {...store.controller.alarmMute.request, active: true}, which doesn't define remaining unless the store already has store.controller.alarmMuteRequest !== undefined; in which case the solution is just to do {...store.controller.alarmMute.request, active: true, remaining: 120} (though 120 is a magic number and thus should be defined as a file-level constant somewhere).

rohanpurohit commented 3 years ago

Yes! I have faced something similar where the frontend crashes too soon and cannot connect to the backend, and yes it's mainly while working on the frontend while it's running and I make a change and yarn start basically refreshes and builds again and if there's no input on the screen, backend just kills it. if we send some input in that small time, the backend doesn't kill the frontend Also on rare occasions I have seen this message while running the backend, but its not consistent, i think i saw this after #377. but yes it could be because of #370 too. since one came after the other I'm not sure when I saw this message.

Note: this is while running the simulator on my local pc

021-06-10 14:35:32,522 ventserver.integration._trio INFO     Initializing from file: ParametersRequest
2021-06-10 14:35:32,524 ventserver.integration._trio INFO     Initializing from file: AlarmLimitsRequest
2021-06-10 14:35:32,527 ventserver.integration._trio INFO     Initializing from file: AlarmMuteRequest
2021-06-10 14:35:32,530 ventserver.integration._trio INFO     Initializing from file: SystemSettingRequest
2021-06-10 14:35:32,537 ventserver.protocols.devices.file.ReceiveFilter ERROR    MessageSender:
Traceback (most recent call last):
  File "/home/pi/pufferfish-software/backend/ventserver/protocols/transport/messages.py", line 65, in parse
    self.payload = message_object.parse(body_payload)
  File "/home/pi/.pyenv/versions/3.7.7/envs/ventserver/lib/python3.7/site-packages/betterproto/__init__.py", line 760, in parse
    parsed.wire_type, meta, field, parsed.value
  File "/home/pi/.pyenv/versions/3.7.7/envs/ventserver/lib/python3.7/site-packages/betterproto/__init__.py", line 701, in _postprocess_single
    fmt = _pack_fmt(meta.proto_type)
  File "/home/pi/.pyenv/versions/3.7.7/envs/ventserver/lib/python3.7/site-packages/betterproto/__init__.py", line 288, in _pack_fmt
    }[proto_type]
KeyError: 'uint64'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/pi/pufferfish-software/backend/ventserver/protocols/devices/file.py", line 90, in output
    message = self._message_receiver.output()
  File "/home/pi/pufferfish-software/backend/ventserver/protocols/transport/messages.py", line 118, in output
    message.parse(body, self.message_classes)
  File "/home/pi/pufferfish-software/backend/ventserver/protocols/transport/messages.py", line 70, in parse
    ) from exc
ventserver.protocols.exceptions.ProtocolDataError: Unparseable payload: b'\x15\x00\x00\xf0B'
2021-06-10 14:35:32,539 ventserver.protocols.backend.server.ReceiveFilter ERROR    The state type NoneType data in the file does not match the filename AlarmMuteRequest.

maybe there's some discrepancy in the remaining field after i changed the protobuf message for remaining from float to uint64, i did do that through the generate_all script

ethanjli commented 3 years ago

Hmm interesting, I haven't seen that error message before in the backend - I think that's a separate issue. Maybe your protobuf file for AlarmMuteRequest is out-of-date compared to the message schema. Can you try deleting the backend/statestore directory to see if you still get that exception in the backend?

ethanjli commented 3 years ago

What I see in EventAlerts.tsx is that muteAlarmState and a useEffect for setting alarmMuteRequest.active == false are both potential places for what I described in https://github.com/pez-globo/pufferfish-software/issues/386#issuecomment-861267358. Since muteAlarmState is just an onClick handler, probably the useEffect is the source of this problem.

rohanpurohit commented 3 years ago

@rohanpurohit Can you see if you can reproduce what I've observed? If so, we'll need to figure out whether something is missing in the backend for initializing AlarmMuteRequest in the frontend, or whether something is missing in the frontend for initialization of AlarmMuteRequest - e.g. if the frontend is overwriting the store's copy of AlarmMuteRequest with {active: true} or {...store.controller.alarmMute.request, active: true}, which doesn't define remaining unless the store already has store.controller.alarmMuteRequest !== undefined; in which case the solution is just to do {...store.controller.alarmMute.request, active: true, remaining: 120} (though 120 is a magic number and thus should be defined as a file-level constant somewhere).

Yes, you're right passing in the remaining field to the object seems to solve the issue, changing this to dispatch( commitRequest<AlarmMuteRequest>(MessageType.AlarmMuteRequest, { active: false, remaining: 120, }), );

I think I'll add this fix in #383 along with the other changes there? although as soon as I did that, I see the log I posted #386 (comment) in the terminal (maybe we can file this as a separate issue and fix that later if we can reproduce it)

Hmm interesting, I haven't seen that error message before in the backend - I think that's a separate issue. Maybe your protobuf file for AlarmMuteRequest is out-of-date compared to the message schema. Can you try deleting the backend/statestore directory to see if you still get that exception in the backend?

so I did this, started the backend again, everything was fine, then did the above change and caught it just as the frontend (yarn start was running) refreshed but haven't been able to reproduce it again (one time maybe?) probably something minor then

rohanpurohit commented 3 years ago

(though 120 is a magic number and thus should be defined as a file-level constant somewhere).

also just a quick note do we really care about magic numbers on the frontend? I can see that we are using values upto 200 without adding a constant somewhere.

ethanjli commented 3 years ago

(maybe we can file this as a separate issue and fix that later if we can reproduce it)

That would be very helpful, thanks - I think we need to dig deeper into this, once we can figure out how to reproduce it. When this exception occurred, what behavior did you observe in the backend and the frontend? e.g. did it prevent the application from running? That would be ideal, because it means that the bug will be easier for us to catch if it's still there.

(though 120 is a magic number and thus should be defined as a file-level constant somewhere).

also just a quick note do we really care about magic numbers on the frontend? I can see that we are using values upto 200 without adding a constant somewhere.

I think in general we should define constants for magic numbers, we haven't been looking at that because there are so many other things to refactor in the frontend - but for new code let's try to be more rigorous about that.