olliw42 / mLRS

2.4 GHz & 915/868 MHz & 433 MHz/70 cm LoRa based telemetry and radio link for remote controlled vehicles
GNU General Public License v3.0
295 stars 67 forks source link

rx,tx, mavlinkx #93

Closed olliw42 closed 1 year ago

olliw42 commented 1 year ago

this adds mavlinkx hardens the parser cababilities, ensures no valid mavlink message is missed no compression yet however

@jlpoltrack @brad112358 @4O6FPV (@4O6FPV: added you since you indicated somewhere you woukld want test the mavlinkx branch) I consider adding this to main, any concerns or cons or counter arguments?

4O6FPV commented 1 year ago

Sorry I am not the most savvy person when it comes to code and stuff like this. Can someone explain in common terms the difference between this branch and the main master. Does this not include any of the RC stuff and is simply a mavlink radio? Again I apologize for the lack of knowledge!!!

olliw42 commented 1 year ago

this is a very valid question and there is nothing to appologize for

this code works EXACLY like that in main, except that it offers a new option "mavlinkx" for the RxSerLinkMode option. When this option is selected, the mavlink communication over the air is modified, in that a different format is used which allows a more robust parsing. The original MAVLink protocol is good at rejecting invalid frames (thx to a crc) but is terrible at finding all valid packets in a data stream. That is, in the presence of transmission interference, it happens that received valid packets are ignored even though they are valid! And this may be not just few packets but more than you think.That's clearly not what one wants, and the mavlinkx protocol solves that issue. All this affects only the transmission over the air, the mavlink data on the serial ports is still normal mavlink. So, to a user all should be exactly as before, except that the link is more reliable in the sense that you get more of the packets. Since this mechanism however is somewhat involved, the code needs careful testing. I think I tested it for quite a number of cases, but I may have not caught all cases. So, in short, with mavlinkx enabled,, if everythings works well you simply should not note anything.

jlpoltrack commented 1 year ago

I've flow a bunch on this branch previously and had no issues related to MAVLink so I think okay to merge.

For my own curiosity, are there any statistics around how much the extra MAVLinkX header decreases the link capacity?

olliw42 commented 1 year ago

I've flow a bunch on this branch previously and had no issues related to MAVLink so I think okay to merge.

ah, I failed to be clear on this. I have quite changed the format and also made changes to prepare for future additions (like compression). So, sadly, it kind of needs to be retested. Sorry for not having been clear on this.

The link capacity is not affected for the majority of messages. I don't have solid numbers, but the effect is on the order of few percent. If you measure in terms of valid packets and not transmitted bytes it actuall will effectively increase the capacity, since you get more good packets from the "same" data stream, i.e. same number of lost lora frames.

It is also ready now for adding compression, and can give further benefits (e.g. allow routers to route with much less latency). So, at the end it should give only advantges on all accounts.

EDIT: I could derive such statistics from the two logs you once provided me ... maybe I will do that :)

jlpoltrack commented 1 year ago

@olliw42 Just had a 40 minute fly on this branch (without ground station) and didn't notice any bad behavior. Could provide the FC or EdgeTx logs if you're interested. 2.4 GHz | 50 Hz | 10 mW

olliw42 commented 1 year ago

@jlpoltrack MANY THX!

I must admit I would wish that also Brad would have a chance to test this, to have a second independent pair of eyes. He was concerned with such matters before and probably could provide a particular critcal view on various aspects.

Maybe I let it sit for another while

brad112358 commented 1 year ago

@olliw42

I'm sorry I have not found time to test this yet, but I did start looking at the code today. But I don't yet even understand the over-air format. I see the table in comments in fmav_mavlinkx.h, but I'm not sure I understand the purpose of all the fields. There also seems to be more additional bytes in the header than I expected. Is this exactly the same as you proposed to the MAVLink folks? If so, perhaps include the link for reference. If not, more comments on theory of operation and the purpose of would be helpful in understanding the code.

Thanks!

olliw42 commented 1 year ago

hey Brad nice to see you and very nice you are looking at it.

no, the header used here is not the same as what I proposed to the mavlink folks, a bit to the contrary. The header proposed to them was intended to achieve a number of goals in a most compatible way. This here however is designed to achieve the very same goals plus some additions which help us; the main difference is that I - since my proposal was downed somewhat rudely anyway - took the liberty to not trying to stay most compatible with the v2 header anymore but to do it as one may do it without that constraint. Note, since this is all inside mLRS (and at most also inside MAVLink4OpenTx) we can use whatever protocols and formats we want. The idea of using a mavlink-ish protocol instead of a "whatever protocol" is that conversion to and from mavlink is simple, since only the header is affected.

I may have missed one or two points, but I hope this let's you get the idea of the header structure, namely to achieve the needs for robust parsing, especially reliable packet detection (at which the original mavlink is so bad at) as well as to be compact when possible.

That's the header. The parser implementation is a bit more complex than one finds in usual codes, since it does backtracking for the header. That is, if it finds a start marker, it reads the header and checks the header by its crc8, and if this fails the header is scanned for if it contains another start marker, and the process restarts.

There is no backtracking of also the payload, as this would be in fact a somewhat nasty and resource hungry code to write (backtracking of the header is much easier since there can't be multiple messages in these few bytes, but a data junk of up to 255 bytes can contain many messages). This piece however is not really needed as it can be ackomplished by a parser reset upon lost lora frames (which was an original suggestion of you).

olliw42 commented 1 year ago

concerning payload compression, as you once asked if I would still consider it. At least with my limited naive knowledge, I found it very difficult to achieve reasonable compression rates. Without knowing nothing I was hoping for 20%, 25%, 30%, which I would have found quite worthwhile. Unfortunately, the distribution of bytes in the payload doesn't really allow that, and for a long while I struggled to find a scheme which would be better than even only 10%. I recently however found a scheme which does better than this 10% (for the two logs mentioned before, which are my main data source, it gives ca 13%), su that's what I'm now trying to implement in the code. Better than 10% is not very compelling, but, well, that's what the data allows.

brad112358 commented 1 year ago

OK. I think I understand the header and your strategy now. I still haven't examined the code closely, but I have some suggestions and comments already:

I suggest adding more comments near the beginning of fmav_mavlinkx.h to explain what's going on for future contributors.

Your approach seems reasonable except it results in a lot of code and complexity with not much benefit in it's current form. Do you anticipate actually using the currently unused fields to improve routing in the mLRS product? I have to admit I'm not entirely clear on the use case for routing in the radio or how this helps. Can you elaborate?

Does this currently provide any advantages besides near perfect identification of message start in the face of radio packet loss? I believe the strategy we discussed earlier provides the same reliability. To remind, I had proposed 1) Discard MAVLink parser state when a radio packet is lost (which I think you are doing now) and, 2) Marking the start of the first MAVLink header in every radio message. Since we can assume the serial links on either end of the radio are reliable, this provides the same reliable location of every MAVLink message which actually makes it across the radio link as this code does. There would be 1 unused bit in the start marker byte which could be used to indicate compression is used, so compression could also be implemented on top of this basic strategy.

Since the crcextra is apparently currently always used, isn't the minimum header size the same as MAVLink V2?

Just some preliminary thoughts...

Thanks!

olliw42 commented 1 year ago

many thx for your comments and thoughts.

There is a missunderstanding however. The topic of interest here is not to repeat and restart an intense discussion of the past. Sure, there are plenty of ways to achieve a thing. And sure, if a list of goals is cut down to only one then a different way may look more appropriate. But that's not the topic.

The point of interest (to me) is whether the code does do the job of not losing any valid packets in the presence of frame lost, in terms of the implementation being correct as well as in the actual practical real world, i.e., if it just works as intended.

There would be 1 unused bit in the start marker byte which could be used to indicate compression is used, so compression could also be implemented on top of this basic strategy.

nope. Compression is per message.

Since the crcextra is apparently currently always used, isn't the minimum header size the same as MAVLink V2?

correct. Note though that the a priori goal is not to have a shorter header than v2.

Also, not supporting unknown messages seems like a problem which needs to be fixed.

this is a mavlink fact and reality, and not anything for us to fix. To the contrary, it's a bad deficiency of mavlink that one needs to know each message for properly handling it.

brad112358 commented 1 year ago

hardens the parser cababilities, ensures no valid mavlink message is missed no compression yet however

Sure, there are plenty of ways to achieve a thing. And sure, if a list of goals is cut down to only one then a different way may look more appropriate. But that's not the topic.

But you indicated only one goal for this PR and mentioned only one additional possibility. What are the other future goals/problems this approach helps solve, if any?

The point of interest (to me) is whether the code does do the job of not losing any valid packets in the presence of frame lost, in terms of the implementation being correct as well as in the actual practical real world, i.e., if it just works as intended.

I hope and believe that design approach is always in scope when discussing a proposed solution to a problem and not just correctness of the code.

I apologize for rehashing our debate. I also prefer not to repeat a (rather difficult) discussion from the past. But, back then, you mentioned that MAVLinkX would solve other problems too. I thought that when I saw specific details of your approach, I would better understand your goals and thinking. But, I'm sorry to say, I don't yet see this contributing to solving any other problem in mLRS. Just like your previous proposal to the MAVLink community, I can see how MAVLinkX has advantages over the current MAVLink protocol if everyone was using it and I'm sorry that proposal didn't get traction. But I don't yet see any specific advantage for over-the-air use in mLRS which couldn't be achieved by the much simpler approach I proposed in our previous discussion.

I prefer to understand where this is all heading before I spend much time testing this or reviewing the code.

I asked some specific questions above which I thought might help me understand, but you didn't answer them.

Can you elaborate on the use case for routing in mLRS and how this helps?

Also, what other problems might this approach help solve besides framing recovery in the face of radio packet loss?

How will the unused fields you have defined eventually be used in mLRS?

nope. Compression is per message.

I think you missed my point or, perhaps, I'm not understanding your point. You mentioned compression earlier, but didn't specify if/how the new over-the-air header might help. Does the new header help with compression? If so, how?

Also, not supporting unknown messages seems like a problem which needs to be fixed.

this is a mavlink fact and reality, and not anything for us to fix. To the contrary, it's a bad deficiency of mavlink that one needs to know each message for properly handling it.

I understand that the endpoints need to understand messages, but we are not an end point. In what way are unknown (new) messages not supported by mLRS when MAVLinkX is enabled? What happens when an unknown message is received by mLRS via the serial link on either end?

The strategy I proposed to make framing recovery reliable in the face of lost packets (indicating out-of-band the start of the first MAVLink frame in each radio packet) handles unknown MAVLink messages just fine. This is because the serial links are reliable and, once initially synchronized, we can reliably identify the start of every MAVLink frame received on the serial links.

I'm trying to understand what happens in the MAVLinkX case. Are unknown message types dropped? Or, do they reduce the reliability of frame recovery?

Thanks in advance for your patience in helping me better understand the intent of this code.

olliw42 commented 1 year ago

many thx for the response, and taking your precious time.

While it is reasonable, the problem with the approach you are taking is that it substantially disgresses from the main question, producing side discussion (and text) not relevant to the main question. In at least 3 points: alternative schemes, further goals, general issues with mavlink.

You raised once the very relevant point that upon lora frame losses lots of valid mavlink messages in the received frames are potentially and unnessecarily lost. You proposed a possible solution. I here propose another solution. You may not like it, I get this, but yet it is IMHO a valid question to ask if the solution proposed here does do this. I would like to focus on this aspect of the story.

I think the alternative schemes has beend discussed in extend, and the further goals were also mentioned way back a couple of times. I do get that you are not convinced. Accepted. Will not change my course though. The general issues with mavlink could be entitled "extra crc issue" and "target issue", as exposed in some of my topics in the mavlink repo. For further discussion on any of these topics, I would kindly ask to open a respective discussion and to not do it here.

the much simpler approach I proposed

your proposed approach is only much simpler in terms of overall code complexity, and maybe concept, but it is in fact significantly more complex concerning the main code (ie exclusive fmav lib), and also scales poorly. This was a main concern already then. Truth is also, there is no code demonstration which would have worked out the edge cases. The available incomplete code hardly can proof the claim. So it's rather an apple to orange comparison: To the left we have a claim and a not fully worked out code and to the right there is a fully worked out code.

nope. Compression is per message.

I think you missed my point or, perhaps, I'm not understanding your point.

Compression would be per mavlink message, hence the info needs to be stored for each mavlink message and hence can't be stored in the lora frame header.

olliw42 commented 1 year ago

serial links are reliable

btw, just for the record: We should not rely too much on assuming that the serial links are perfect. Hardware-wise they may be but this does not imply that the data streams transported on them are necessarily totally flawless mavlink streams. On the receiver side I would agree this assumption may be ok, besides that e.g. some routers appear to produce dangling zeros, some systems can have DMA contention issues, etc., spoiling the data stream. However, on the transmitter side we should not rely too much on it. The communication may not be perfect, like we had between tx module and radio, and this can be hardware dependent. Also, the data may go via yet another wireless link, like an ESP. And so on. Not saying this is the dominat error mode, but we don't want the system to fall appart just because our code doesn't allow for any hickup. Just wanted to have mentioned it.

brad112358 commented 1 year ago

Olli, as the owner of your project, you have the right to solve a problem as you wish and you have made it quite clear that you don't want my input in this case. I will have to accept that.

But, in the process, you made a couple of statements to which I feel obliged to respond.

and also scales poorly

I` don't see any dimension in which my approach does not scale as well as yours. If you disagree, I'd appreciate clarification as I may have missed something.

Truth is also, there is no code demonstration which would have worked out the edge cases. The available incomplete code hardly can proof the claim. So it's rather an apple to orange comparison: To the left we have a claim and a not fully worked out code and to the right there is a fully worked out code.

This hardly seems kind or fair, since the reason I didn't fully implement my strategy is because you made it clear you would not accept it.

btw, just for the record: We should not rely too much on assuming that the serial links are perfect. Hardware-wise they may be but this does not imply that the data streams transported on them are necessarily totally flawless mavlink streams. On the receiver side I would agree this assumption may be ok, besides that e.g. some routers appear to produce dangling zeros, some systems can have DMA contention issues, etc., spoiling the data stream. However, on the transmitter side we should not rely too much on it. The communication may not be perfect, like we had between tx module and radio, and this can be hardware dependent. Also, the data may go via yet another wireless link, like an ESP. And so on. Not saying this is the dominat error mode, but we don't want the system to fall appart just because our code doesn't allow for any hickup. Just wanted to have mentioned it.

This is generally true, but... My approach doesn't really fall apart if there are errors on the serial link. Actually, I believe your strategy suffers exactly the same loss due to any errors in the serial or WiFi or Bluetooth links at the two ends of the mLRS link. If these links are not reliable, messages will be lost, possibly including one or more following messages which actually arrived intact. This is equally true for your approach and for mine. As always, please correct me if I have missed something.

The only way to avoid this extra loss is to implement full backtracking in every MAVLink decode. Or, of course, to convince the world to use MAVLinkX everywhere (which I would support).

Compression would be per mavlink message, hence the info needs to be stored for each mavlink message and hence can't be stored in the lora frame header.

OK. It was me who missed the point. Sorry.

I hope you don't consider it out of scope to answer this specific question which I've twice asked about a comment you have in the code which sounds like a feature regression:

In what way are unknown (new) messages not supported by mLRS when MAVLinkX is enabled? What happens when an unknown message is received by mLRS via the serial link on either end?

In other words, does enabling this implementation of MAVLinkX interfere with the ability of mLRS to transmit MAVLink messages which are unknown to it?

Thanks!

olliw42 commented 1 year ago

you have made it quite clear that you don't want my input in this case.

this doesn't reflect how I see it. I'm eager to know if you find it to work or if you find flaws. The one point which had been settled though, which is true, is the path.

This hardly seems kind or fair, since the reason I didn't fully implement my strategy is because you made it clear you would not accept it.

fair or not, it's was a statement of a fact. Facts are not always fair. It's also fact that the example code does not have the mavlink message marker idea implemented, I can't imagine how this would make it cleaner. Also, this obviously wasn't the argument I wanted to make; my argument was that there is no solid support for "much simpler". In fact, architecture-wise the mavlinkX approach is much simpler. A good measure for this type of complexity is how many libs&functions&strcuts need to be affected and modified.

I don't see any dimension in which my approach does not scale as well as yours. If you disagree, I'd appreciate clarification as I may have missed something.

since it isn't as separated out or clean in the said sense it is way more vulnerable to any future changes in the over-the-air handling. I shared this argument btw before.

This is generally true, but...

I obviously have not meant it in any other way than general. No explicit nor implied claim that one or the other method would be better or worse, especially not that specifically your method would fall appart. Was just saying that - since this argument was/is made repeatedly (yes, by me too) - it has its loopholes, especially on the Tx side.

I hope you don't consider it out of scope to answer this specific question which I've twice asked about a comment you have in the code which sounds like a feature regression: In what way are unknown (new) messages not supported by mLRS when MAVLinkX is enabled? ... In other words, does enabling this implementation of MAVLinkX interfere with the ability of mLRS to transmit MAVLink messages which are unknown to it?

I had given an answer to this in the above. May not have transmitted. It has nothing to do with mavlinkX, totally independent on it, not a regression brought in by mavlinkX or this specific implementation at all. It was there before, and will be there after.

:)

olliw42 commented 1 year ago

addeds as MLRS_FEATURE in v0.3.36. hence closing.