teslamate-org / teslamate

A self-hosted data logger for your Tesla 🚘
https://docs.teslamate.org
MIT License
5.77k stars 720 forks source link

teslamate deletes mqtt retained messages if empty #1336

Closed brianmay closed 2 years ago

brianmay commented 3 years ago

Describe the bug

When teslamate publishes a geofence when the car is not in a geofence, it publishes an empty string "" to mqtt.

But on my mqtt server (mosquitto), publishing an empty string results in the retained object being deleted (arrgh!). I believe this is part of the mqtt spec. Google search seems to confirm.

If a client connects, it will not get the message because it is no longer retained, and cannot determine if the car is outside the geofence or just hasn't received the message yet.

If this is not clear enough, I can provide more references, but getting a bit distracted by external events right now :-)

Expected behavior

The geofence position message should always be retained by the mqtt server.

How to reproduce it (as minimally and precisely as possible):

Move car from inside geofence to outside geofence. Note that message gets sent with "geofence" = "". Disconnect client and reconnect. Notice that client doesn't receive the message, because it is no longer retained.

Notes

I finally correctly diagnosed this issue because I had the exact same problem with another unrelated application I am responsible. Not sure what the best solution is though. I think we can't publish an empty string and expect it to be retained.

In my case I hard coded a magic value to indicate that there was a special condition. Although this solution makes me feel a bit dirty.

Another possibility is to send a json object for geofence. Not sure if I like this or not. It would be sufficient though.

Environment

adriankumpf commented 3 years ago

Good catch. I guess the same problem exists with some other topics, namely those for which values are always published, even if they are nil.

https://github.com/adriankumpf/teslamate/blob/188eb036725708b209940162ee02c697f0f09ac8/lib/teslamate/mqtt/pubsub/vehicle_subscriber.ex#L47-L49

I also can't think of any workaround other than publishing a magic value. Perhaps a space instead of an empty string?

brianmay commented 3 years ago

I have been thinking about this some more.

The main problem I have with the magic value, is that it isn't possible to distinguish the magic string and, say if somebody really does have a location called NONE for example.

This isn't going to a problem if the magic string is an invalid value. For example, if the published data is normally an integer, and we publish NONE it is pretty obvious that there will be no conflict. Or if we add validation rules to to say that any string starting with a : character is now considered an invalid geolocation name.

Some options I have been playing around with:

  1. Add quotes before/after location. e.g. "Home" or NONE. This should work even if user entered quotes for value (why?), e.g. ""Home"".

  2. Add validation rules (if not already present) to make certain geofence names invalid. e.g. disallow entering space for a geofence. Or maybe disallow any geofence that starts with a space. Or maybe pick another special character, e.g. :. That way you could publish Home, :NONE, or even maybe :ERROR_42 (if you required that).

The advantage of 1 is that you don't have to add extra validation rules. But might make it harder to parse. Especially say from node-red, without writing JavaScript code.

The advantage of 2 is that the API remains mostly unchanged. Client code needs to be careful not to get confused/crash with the new magic values (e.g. if it blindly tries to interpret as an integer), but I think no way around this. Pick any character.. %'might also be a good choice. I chose : because they look like atoms is the elixir world (but of course normally these are lowercase).

I tend to favor option 2 at present. You might have problems doing this if the data really is free form text that comes from Tesla, that could start with ':' but looking at the list you pointed to above, most of these values already are restricted in scope (at quick glance many look like numbers only), or (for the case of geofence) can be restricted.

LelandSindt commented 3 years ago

@brianmay why is it important that you receive a retained message for geoFence why not just default to empty and let a received message overwrite?

For example... https://github.com/gaussmeter/retractor/blob/ee0c31afc4ca4b25a50023cc2ca0e901bc36e7d6/main.go#L25

https://github.com/gaussmeter/retractor/blob/ee0c31afc4ca4b25a50023cc2ca0e901bc36e7d6/main.go#L59

brianmay commented 3 years ago

@LelandSindt Like I said in the bug report, sending an empty string doesn't just mean "send this message". It tells the mqtt server "send this message and delete the retained message".

I believe this is the standard way for deleting retained messages on mqtt.

If a new client subscribes to that topic, it will not receive any messages, because the retained message was deleted, and can't tell what the state is. Until a new message is sent. Then when it does receive a message. My experience is that teslamate will not send any more messages after geofence: "" until the car enters a geofence again.

Make sense now?

If you want retained messages to work properly, then all state messages need to be retained, even if the state is "this variable doesn't have a valid value right now".

LelandSindt commented 3 years ago

@brianmay I understand the behaviours. I could have been more clear with my question...

To me it seems reasonable to send a blank and remove the retained value for geoFence when the vehicle leaves a geoFence.

If you have an active subscription to geoFence when the car leaves a geoFence you will receive a blank message.

If you create a new subscription to geoFence when the car is outside a geoFence you will receive no message until the car enters a geoFence. It is your responsibility to initialize local variables to blank.

... or... is that your point? In the latter case you have to assume geoFence is blank/unset rather receive a message for geoFence that confirms the car is outside a geoFence.

brianmay commented 3 years ago

OK, so lets say you initialize the values to blank when your application starts. The you start the application. But you don't get any messages. Is this because the mqtt server was restarted and there is no retained messages? Or is it because the car was not in a geofence? It is not possible to tell. In this case when you initialize the value to blank you are not saying "not in any geofence", you are saying "unknown", and we can't determine a known state until we receive a message. Which could be some time. If the "not in any geolocation" message is retained then we can prevent this confusion from occurring for the most likely case of the car not being in any geofence.

Then when you receive another message, you can be sure that that car just entered the geofence. For example.

LelandSindt commented 3 years ago

For what its worth, I am not arguing against an empty, null or none retained value...

However, I think that the difference between unknown and assumed blank is semantics in this case...

brianmay commented 3 years ago

I see what you are saying, and I agree. I think my argument may have been getting too complicated. I realize we are all in agreement that this needs to be fixed, but regardless, will simplify my argument, so I can try to better make sense of it :-) :

At the moment when I start my application, it gets no geofence. So this means that the state is unknown. If the application later gets a geofence message, there is no way of telling if it only just arrived in this geofence, or it if always was in that geofence.

If we changed it though so that it send a retained empty value, and assuming that the mqtt server has correctly retained this empty value, when I start my application it would immediately receive this retained empty value. Then if the application later receives a geofence message, we can be reasonably confident that the car only just entered the geofence.

I ended up rewriting my code multiple times to try to work around this and getting my logic confused every-time. So hopefully I have it right now :-)

At the moment, I am more interested in solving #1221.

rbswift commented 3 years ago

At the moment when I start my application, it gets no geofence. So this means that the state is unknown. If the application later gets a geofence message, there is no way of telling if it only just arrived in this geofence, or it if always was in that geofence.

I'm not sure this is quite correct. If you start your application (for example mosquitto_sub -t 'teslamate/cars/1/geofence' -v) and get no geofence then that means your car is not inside a geofence. If it was inside a geofence you should get a retained message naming that geofence. If you are concerned that Teslamate may not be active then that is a somewhat seperate concern and can be managed using a LWT topic. Retained messages are retained by mosquitto through server restarts

brianmay commented 3 years ago

Thinking about some more, I think only geofence has the problem. For the other values "" means that the value is unknown, and it makes probably makes good sense that the retained value gets deleted if the value is unknown. For geofence, "" doesn't always mean unknown. It could mean no value. (see #1885 also).

Still not entirely sure I communicated the problem clearly.

Lets say you have an app. You start app. It receives a geofence value immediately. Now the app knows what the geofence is.

But lets say you start the app, and it doesn't receive a geofence. Is this because it hasn't received the geofence message yet? Should it wait longer? Or maybe the geofence retained value was deleted because the car isn't in a geofence? Or maybe it was never set in the first place, i.e. it really it unknown? We only get the benefit of using retained mqtt messages if the car is in a geofence, and we lose any benefit as soon as the car leaves the geofence.

Perhaps a more compelling argument, what happens if there are multiple overlapping geofence? mqtt will only publish one geofence. Perhaps we should keep geofence as is, and add another value - maybe geofences - that outputs in json format a list of all geofences (or [] for no geofences or "" for unknown geofences). This would solve both problems.