sparky8512 / starlink-grpc-tools

Random scripts and other bits for interacting with the SpaceX Starlink user terminal hardware
The Unlicense
482 stars 64 forks source link

Make robust against field removal in grpc protocol #66

Closed sparky8512 closed 1 year ago

sparky8512 commented 2 years ago

(These are mostly notes to self, for anyone wondering why I'm being so long-winded about this...)

Several times in the past, SpaceX has removed fields from the dish's grpc protocol that the scripts in this project were using, resulting in a script crash on attempt to read any data from the same category (status, history stats, bulk history, or location). While loss of that data is unavoidable unless it can be reconstructed from some other field, it would be much better if it didn't cause a crash. I mentioned in issue #65 that I would put some thought into how to accomplish that. After all, one of the intentions of the core module (starlink_grpc.py) is to insulate the calling scripts from the details of the grpc protocol, even though it's mostly just passing the data along as-is in dictionary form instead of protobuf structure.

The problem here is a result of using grpc reflection to pull the protobuf message structure definitions instead of pre-compiling them using protoc and delivering them along with this project. It's pretty clear from the protobuf documentation that the intended usage is to pre-compile the protocol definitions, but there's a specific reason why I don't do it that way: SpaceX has not published those protocol definitions other than by leaving the reflection service enabled, and thus have not published them under a license that would allow for redistribution without violating copyright. Whether or not they care is a different story, but I'd prefer not to get their legal team thinking about why reflection is even enabled.

Before I built use of reflection into the scripts (by way of yagrc), I avoided the copyright question by making users generate the protocol modules themselves. This complicated installation, caused some other problems, and ultimately didn't actually fix this problem, it just moved it earlier, since it was still just getting the protocol definitions via reflection.

So, other than use of pre-compiled protocol definitions, I can think of 2 main options:

1) Wrap all (or at least most) access to message structure attributes inside a try clause that catches AttributeError or otherwise access them in a way that won't break if the attribute doesn't exist. This could get messy fast, since I don't want a single missing attribute to cause total failure, but would probably be manageable. It also really fights against the design intent of how rigidly these structures are defined by the protoc compiler, but I find myself not really caring about that, as it's exactly this rigidity that is causing the problem here.

2) Stop using the protocol message structures entirely and switch to pulling fields by protobuf ID instead. This would make the guts of the code less readable, but would insolate this project against the possibility that SpaceX ever decides to disable the reflection service. I have no reason to believe they would do so, but I also have little reason to believe they wouldn't. I'm not sure how difficult this would be, though, as the low-level protobuf Python module is not meant to be used in this way, as far as I can tell. Also, this would still cause problems if SpaceX were to reuse field IDs from obsoleted fields that have been removed, but they haven't been doing that (and it's bad practice in general, as it can break their own client software), as far as I can tell. If I were writing this from scratch, knowing what I do now, I would probably try to do it this way.

However it's done, I should note that this will make obsoleted fields less obvious. The removed data will just stop being reported by the tools. I have been using these failures as an opportunity to update the documentation on which items are obsolete, but I suspect this will still get noticed, and I'd rather have the documentation lag behind a bit than have the scripts break.

sparky8512 commented 1 year ago

Before I forget again:

I did look into option 2 noted above, and just as I have the last several times I looked into such, I concluded that it'd be technically possible, but not worth it at this point. As far as I can tell, there is code that could be called for serializing and deserializing raw protobuf messages, but it is in an internal package, so directly using it would be subject to breakage as that package gets updated. On top of that, that is in the pure Python implementation, so using it would mean losing the performance benefit of the native code versions. As an alternative, it should be possible to build descriptors for the messages dynamically, and then just run those through the same process as yagrc uses to create the message classes. There's another problem here, though: there are a couple cases where the strings used for some fields are taken from the reflected protocol data, because SpaceX periodically adds things, and I really don't want to go back to having to update the starlink_grpc module to support every new new alert type or outage reason.

Meanwhile, I believe option 1 will not be so messy as I was thinking, so I guess I'll just stop whining about it and go with that option.

sparky8512 commented 1 year ago

Those changes should be sufficient for anything that uses the core module (starlink_grpc).

The only significant script that does not use that module is dish_control.py. I may switch that over to use the core module now that it actually has some (minimal) value add for those grpc requests, but if the protocol changes on them, the related functionality is just going to stop working, anyway.