Open lawik opened 1 week ago
@jjcarstens thoughts / feedback?
Personally, I don't think it's needed. If we're marking a successful update, we should have the metadata and uuid. If we don't, we should blow up cause that's totally against expectations. It seems more detrimental to have these events with a nil firmware uuid and the reason it is missing is totally hidden. I think this was just the result of the host test setup
I had this code path randomly blowing up entirely without doing any updates. Just my "host" device joining and not having fully populated nerves runtime kv. I fixed it by setting the UUID. But the telemetry event crashing the DeviceChannel connection seemed bad. I couldn't keep the device on so I wouldn't have been able to ship a fix I think.
@jjcarstens you are entirely right that this was weird for a firmware update success.
It was also weird that a nil firmware would trigger this behavior. That's a thing we will see with onboarding via shared secrets.
Turns out the current code just interpreted a non-matching firmware metadata structure as the firmware having changed. This should be a better fix.
That doesn't sound quite right. Firmware tracking is integral to the firmware update server. Even if it is shared secret connection, the firmware metadata should still be reported with the connection. This still seems like we're making an affordance for a situational that should be exceptional
Okay. I see your point. I'm not sure I agree that starting from unregistered firmware is exceptional. It can happen whenever someone connects a device running firmware that wasn't also uploaded.
So the initial fix removed the uuid requirement for approving a firmware update. But that I recognize was the wrong place.
This changes so that it doesn't break a device's ability to join if the firmware it reports is missing in the database.
I can see why it would be desirable to be strict on the production side but I think it is a problem if that's a requirement when someone is getting started. This means that we can't set people up with generic firmware as a starting point for then getting their device onto a NervesHub instance. And I've already had a user of NervesHub say that they wanted to do essentially that. Ship a starting firmware that can later be changed.
The current behavior is that the device gets registered but doesn't stay online because the join crashes. So the device enters the system with firmware_metadata: nil. If this situation comes up it is not possible to send firmware to the device because it can't maintain a connection.
Happy to find another angle to ensure we don't lose strictness for production use but I think it will be quite limiting for onboarding if firmware needs to be known up front.
Josh and I did just discuss having some kind of flag that distinguishes between dev/lab Products and serious production products. This might be one of the behaviors that would depend on a setting like that.
I will work through setting up the required bits, might tell me something about how I could make it convenient without trading off any rigeur.
I think I see the misunderstanding here. There is no coupling for the firmware UUID needing to exist in the database. A device can report any firmware uuid in the metadata.
The initial situation of the PR was that no metadata was reported because it was a mocked situation running on host. Missing the required metadata entirely is what I consider the exceptional case and don't think we should make any affordances for it. Maybe we can adjust the error handling of it to be easier to deduce, but I think we need to prevent the connection and not silently allow it through with missing crucial details
Ah yes. I will try to verify with a smaller case that I'm running into a problem where firmware_metadata missing from the db is preventing my join.
This might not be worth doing. I was doing a bad job mocking firmware information on MIX_TARGET=host.