Refactoring: Use char arrays instead of byte strings.

jfjlaros commented 3 years ago

In 803e3245851446ee74e7aca32a1bdfe5df77c70d we added support for arbitrary C arrays. These arrays can be also used to communicate strings, where now zero terminated byte strings are used. Dropping the old implementation will however require an update of the protocol.

@chrisflesher do you have any opinions on this?

chrisflesher commented 3 years ago

No, not really. I don't have a super-great grasp of the protocol yet.

jfjlaros commented 3 years ago

On the other hand, this is the only C array that can be returned because we can determine its size. So it will remain an exception at least in the write.tcc module. Perhaps it is not worth the effort.

I will give it some more thought.

chrisflesher commented 3 years ago

Will try to look at the protocol more later tonight.

jfjlaros commented 3 years ago

It is not very important, but if you need any pointers, please let me know.

chrisflesher commented 3 years ago

Can you provide an example of what stopped working after the recent changes? Maybe that would make it easier to understand.

I tried changing the cVector function to this and found the returned array has length zero, is this the problem?

int* cVector(int* v) {
  int r[4];

  for (size_t i = 0; i < 4; i++) {
    r[i] = float(v[i]) + 0.4;
  }

  return r;
}

Is the problem because of rpcRead the line io.readBytes((char*)data, sizeof(T)); will now not read the last \0? I didn't think the rpcWrite function changed after the last PR?

chrisflesher commented 3 years ago

I see that returning char* works but byte* or int* doesn't in demo.ino.

jfjlaros commented 3 years ago

Nothing stopped working, I was just wondering whether you had an opinion on changing the protocol to simplify the Arduino code a bit.

Right now, we support reading all kinds of C arrays. The client (PC) first sends the size of the array and then the content. There is no support for writing however, because we can not see the difference between a pointer and a C array. The only exception is a C string because this is a zero terminated char*.

Sorry for the confusion.

chrisflesher commented 3 years ago

Let me make sure I understand the question correctly first. You are wondering whether or not to drop support for writing C arrays because it is not possible to determine size? If so, how would dropping this feature simplify the protocol? Because the signature for vectors and c-style vectors is the same (using brackets)?

The char* return seems useful for returning strings. I wonder if you could just make char* a special case, instead of a template for returning every possible type of c-style array just only allow a char* return type as the only choice with signature s to denote a string for python unpacking?

jfjlaros commented 3 years ago

No, I am not proposing to drop any functionality.

Perhaps a little backstory is in order here. In the first version there was only support for basic C types and C strings. C strings were encoded as zero terminated byte strings. To support more complex structures, Vectors, Tuples and Objects were introduced and the latest addition is support for C arrays (e.g., int*, float** etc.). Vectors and C arrays are serialised by first specifying the length, followed by their content.

For example, the following string is encoded as a zero terminated byte string.

char* "abc" -> 0x97, 0x98, 0x99, 0x00

An array however is encoded in a different way.

unsigned char* {0x97, 0x98, 0x99} -> 0x03, 0x97, 0x98, 0x99

What I was proposing is to drop the zero terminated string encoding and to use the array approach instead. This will make reading C strings and arrays of other types exactly the same. Writing (returning) arrays however can not be done in general because there is no way to determine its size beforehand, with the only exception of C strings. So writing char* and char const* strings will remain an exception.

Anyway, having given this idea some more thought, it is perhaps not worth the effort.

chrisflesher commented 3 years ago

Ok, now I think I understand the question now. Is it worth the effort to modify the protocol to so char arrays use the same protocol as byte arrays? Since it provides the same result either way, it sounds like you think it's not worth the effort now. That sounds fine to me.

chrisflesher commented 3 years ago

This is not super related to this question but the serial protocol we use at our company was designed for RS485, it is very similar to the Modbus protocol:

 - byte 0:          sync byte
 - byte 1:          address (for RS485 multidrop)
 - byte 2:          command
 - byte 3:          data length (n)
 - bytes 4 to n+3:  data
 - byte n+4:        checksum hi
 - byte n+5:        checksum lo

The protocol reads one byte at a time and processes the new byte using a state machine. This allows it to recover very quickly from noise on the comms lines by finding the next sync byte after any kind of failure (e.g. checksum failure).

jfjlaros commented 3 years ago

That looks like a lower level protocol that could be implemented as a plugin perhaps?

chrisflesher commented 3 years ago

Hmm... that might be a good idea for future work, will think about it.

jfjlaros / simpleRPC

Refactoring: Use char arrays instead of byte strings. #15