jeelabs / esp-link

esp8266 wifi-serial bridge, outbound TCP, and arduino/AVR/LPC/NXP programmer
Other
2.86k stars 726 forks source link

SLIP MQTT protocol definition for FORTH client implementation #430

Open wolfgangr opened 5 years ago

wolfgangr commented 5 years ago

Inspired by this hack https://hackaday.com/2017/02/13/hacking-on-the-weirdest-esp-module/#comment-5820116 I tried something similiar: MQTT from an STM32 Bluepill running mecrisp FORTH, connected to ESP-Link.

However, when I try to issue test strings, the communication between ESP-LINK and the STM32 breaks down. I have to reset both the µC at http://192.168.1.88/console.html and ESP-Link at http://192.168.1.88/log.html using the HTML-Buttons at those pages.

I suspect that some wrong command string confuses ESP-LINK and renders it in a non-responsive state.

I assume that the communication ESP-LINK <-> µC is neither pure MQTT nor SLIP but a dedicated protocol, correct?

Unfortunately, there is no MQTT in https://embello.jeelabs.org/flib/ yet.

I have repeatedly searched the issues and all docs for some spec, but the only source I found is the https://github.com/jeelabs/el-client/blob/master/ELClient/examples/mqtt/mqtt.ino However, assembling test strings from reverse engineering this very error prone, too.

It would be great to find a couple of preassembled test strings to ensure that the whole setup is working. So I could work down to more complicated functions by small changes. Just some "hello" appearing on my mqtt would be great.

Details from my setup:

I'd plan to rework the whole thing using string buffer (hope to find such in mecrisp) instead of blowing the stack. Would be glad to share, of course.

wolfgangr commented 5 years ago

Maybe I have a pointer alignment problem - among many others. This is no surprise when apps are ported from 8 bit to 32 bit platform. And obviously, the align-word in my mecrisp is dysfunct.

This produced different hard to debug effects, among them the infamous Unhandled Interrupt 00000003 ! crashes (see also here https://jeelabs.org/article/1619a/ ) But I also had obviusly simple definitions that did not work, and irrespossive connections between ESP-Link and mecrisp.

grepping through my source trees, I found flib/spi/rf69ook.fs:113:OOK.RSSI.QLEN 2 rshift 1+ 2 lshift buffer: ook.rssi.q \ don't trust align. So it seems to be a known problem.

aligned seems to work, calign obvioulsy not. My current workaround is

: myalign here dup aligned swap - allot ; 
: align myalign ;

and after possibly uneven c, sequences, I call

calign
myalign

I'll try to figure out whether it is a problem rooted in mecrisp or in the emmbello/flib I include, and will rise an issue in the proper place then.

I hope this finding brings me a bit closer to a succesful FORTH-MQTT-link. Nevertheless, test sequences and/ or an API documentation for the SLIP-MQTT interface would still be appreciated. I'd propose to write some nice text, when I get the proper informations / assistance to do so.

wolfgangr commented 5 years ago

just replaced my reset-wire by a diode as outlined here https://jeelabs.org/article/1619a/ and can do proper soft resets again. Nice :-)

This is not the primary cause, of the issue at hand, just a aggravation of symptoms, but may help to track down the cause of unplanned soft resets - at least I can read the last console printings just before a crash.

wolfgangr commented 5 years ago

hacked a simple string lib to proceed would prefer to test and polish it before I publish it

wolfgangr commented 5 years ago

Making progress - hitting visible walls already.

I manage it to get a message parsed by ESP-Link as supposed-to-be slip. At least I can see a related message in the web log. Maybe the mqtt stuff is not yet OK, but before I have to get the CRC right. This my sending string buffer as visible in FORTH The first two bytes resemble the counter, the next two the max length of the allocated string. So, the data start at ....085C

SLIP-message stringbuf-dump  
Start at 20000858 
Last byte is 089d, containing C0 aka SLIP_END

                                   |-pos-|-len-|-data-> 

20000850   FF F7 33 FF 00 BD 00 00   42 00 80 00 00 00 00 01   ..3..... B.......
20000860   00 00 00 89 C0 00 00 00   0B 00 05 00 00 00 00 00   ........ ........
20000870   14 00 68 6F 6D 65 2F 62   61 73 65 6D 65 6E 74 2F   ..home/b asement/
20000880   77 61 73 00 00 00 02 00   6F 6E 00 00 02 00 02 00   was..... on......
20000890   00 00 01 00 00 00 00 00   01 00 00 43 F4 C0 00 00   ........ ...C....
200008A0   00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00   ........ ........

This is what I read from the ESP-Link log. The start and the count suggest that only the bytes between the C0 aka SLIP_END are counted. However, the end of the messages differ - even in size

16530> SLIP: start or end len=9 inpkt=0
916531> SLIP: start or end len=56 inpkt=1
916531> SLIP: bad CRC, crc=b6dc rcv=f443 len=56
916531> \00\00\00\0B\00\05\00\00\00\00\00\14\00home/basement/was\00\00\00\02\00on\00\00\02\00\02\00\00\00\01\00\00\00\00\00\01\00\00C\F4

How is the CRC to be generated? I compared my hackaday example code with the source and identified this stanza: https://github.com/jeelabs/esp-link/blob/fe4f565fe83f05e402cc8d8ca3ceefbc39692a1f/serial/crc16.c

unsigned short
crc16_add(unsigned char b, unsigned short acc)
  acc ^= b;
  acc  = (acc >> 8) | (acc << 8);
  acc ^= (acc & 0xff00) << 4;
  acc ^= (acc >> 8) >> 4;
  acc ^= (acc & 0xff00) >> 5;
return acc;

this is my implementation in forth:

: crc+ ( old_running_sum new_byte -- new_running_sum ) 
  xor
  dup 8 rshift swap 8 lshift or
  $ffff and
  dup $ff00 and 4 lshift xor
  $ffff and
  dup 8 rshift 4 rshift xor 
  dup $ff00 and 5 rshift xor
;

Compared to the template, I added two $ffff and masks to keep calcuation in the 16-bit-width. I suppose that unsigned short in C is supposed to be 16 bit as well, OK? What else can be wrong? I don't find any initialisation in the C-code, so I assume the accumulator will start with zero. I found a CRC-Calculator on the web, but I coulld not match their algorithm descriptions. https://crccalc.com/?crc=0+1&method=ascii

Any pointer?

wolfgangr commented 5 years ago

could it be a serial communication issue? ran the examples at 460800 switching back to 115200 , ruined my test example, so I have not exactly sam strings. but.... what puzzles me is that the trail of the strings are responded differntly in both cases I list below

01 00 00 A3   C4 C0 
 rcv=c4a3 
\01\00\00\A3\C4
Blimpyway commented 5 years ago

That crc compute could be changed by different endianness in esp vs arm? One is big other is little See https://www.reddit.com/r/esp8266/comments/4ve8b3/what_is_the_endianess_of_the_esp8266/ and https://electronics.stackexchange.com/questions/183021/stm32f103c8xx-big-or-small-endian

On Tue, Jan 29, 2019 at 10:18 AM wolfgangr notifications@github.com wrote:

could it be a serial communication issue? ran the examples at 460800 switching back to 115200 , ruined my test example, so I have not exactly sam strings. but.... what puzzles me is that the trail of the strings are responded differntly in both cases I list below

  • last 6 bytes sent
  • crc reported as bad by ESP-Link log
  • last bytes as responded by ESP-Link log But obviously, the CRC arrived correctly at the ESP-Link - this would not point t a communicatin issue

    01 00 00 43 F4 C0 rcv=f443 \01\00\00C\F4

01 00 00 A3 C4 C0 rcv=c4a3 \01\00\00\A3\C4

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jeelabs/esp-link/issues/430#issuecomment-458447678, or mute the thread https://github.com/notifications/unsubscribe-auth/Ab_a_QvEdYbwIspdL8D3I8UN08tC1ATmks5vIAPKgaJpZM4aU4pg .

wolfgangr commented 5 years ago

I'm just thinking into endianness regarding the last step - putting the CRC result to the message string. But then I might just get two bytes swapped - easy to spot.

Both the C and my FORTH code are implemented in integer math. shouldn't assembly level bare metal prob's be covered by the compilers? The links you pointed me refer both to C Compiles and problem of some endianness macro not included correctly. So either my FORTH or ESP-Link may be compiled wrong and nobody realized? I'll try to figure that ....

wolfgangr commented 5 years ago

The forth side is easy to examine. just a little loop, shifting a bit through a 32 bit number. Used the same lshift word as in the crc implementation above https://github.com/jeelabs/esp-link/issues/430#issuecomment-458353545 I think it behaves as expected - no jumping bits, bit just falls off at the left end

: shifter cr 1 40 1 DO i . 1 lshift dup hex. loop cr ; Redefine shifter.  ok.
  ok.
shifter 
1 00000002 2 00000004 3 00000008 4 00000010 5 00000020 6 00000040 7 00000080 8 00000100 
9 00000200 10 00000400 11 00000800 12 00001000 13 00002000 14 00004000 15 00008000 16 00010000 
17 00020000 18 00040000 19 00080000 20 00100000 21 00200000 22 00400000 23 00800000 24 01000000 
25 02000000 26 04000000 27 08000000 28 10000000 29 20000000 30 40000000 31 80000000 32 00000000 
33 00000000 34 00000000 35 00000000 36 00000000 37 00000000 38 00000000 39 00000000 

How can I inject a test code into ESP-Link implementation? I'd like to see some online CRC calculator like this one to compare https://crccalc.com/?crc=0+1&method=ascii

wolfgangr commented 5 years ago

Are there examples of just one crc step, so that I could cross check my incremental bytewise updater? Hm, looks like I have to dig for a C hello world an paste the code in? But then I would just explore the C-build environment of my Workstation.

I think I'll try to untwine the SLIP wrapper layer from the MQTT inside. Just a test string with a few bytes inside, easy to compare.

But for the moment, live comes into the way....

Blimpyway commented 5 years ago

If you-re not familiar with ESP development tools you can compile an arduino sketch for ESP8266 and print CRC intermediate results incrementally

On Tue, Jan 29, 2019 at 10:55 AM wolfgangr notifications@github.com wrote:

Are there examples of just one crc step, so that I could cross check my incremental bytewise updater? Hm, looks like I have to dig for a C hello world an paste the code in?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeelabs/esp-link/issues/430#issuecomment-458458120, or mute the thread https://github.com/notifications/unsubscribe-auth/Ab_a_WpWXP9m9LI3imbC0j66QJN0dNZ8ks5vIAxkgaJpZM4aU4pg .

tve commented 5 years ago

Your forth CRC code looks correct to me. You need to make sure that the new_byte parameter really only has 8 bits. Sadly I inherited a lot of that low-level stuff from some other project, so it's not documented, grr.. In terms of slip, this is where it happens: https://github.com/jeelabs/esp-link/blob/master/serial/slip.c Each packet starts and ends with a SLIP_END (0xC0). The last two bytes are the computed CRC value, which is calculated over all the bytes except the last two. Note that any 0xC0 or 0xDB value needs to be escaped on the wire and is unescaped before calculating the checksum. A packet consists of a 16-bit command followed by a 16-bit argument count and a 32-bit "callback value" (really an opaque token). This is all little endian. Your best bet is probably to start with the SYNC command whose code is 0x0001 and which takes one arg in the value field, which must be non-zero, e.g., { 0xC0, 0x01, 0x00, 0x01, 0x00, 0x11, 0x11, 0x11, 0x11, crc-lo, crc-hi, 0xC0 } You should get a response back { 0xC0, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, crc-lo, crc-hi, 0xC0 }

wolfgangr commented 5 years ago

I found my error. Silly oversight in the caller to the CRC: What I wanted: DO I c@ crc+ LOOP What I did: DO I crc+ LOOP ... in a loop cycling over string adresses. So I calculated the CRC of the adresses, not the content.

Now I got the message:

 99215> SLIP: start or end len=9 inpkt=0
 99215> SLIP: start or end len=56 inpkt=1
 99216> cmdParsePacket: cmd=0 argc=2816 value=1280
 99216> cmdExec: Dispatching cmd=NULL

I interpret it that way that I have got the SLIP thing right and can proceed with MQTT now. To be sure, I changed the endiannes of my CRC added and got : SLIP: bad CRC, crc=b6dc rcv=dcb6 len=56 Lesson learned: screw your CRC if you need to debug the SLIP content :-O ;-)

wolfgangr commented 5 years ago

Making progress: first succesful MQTT :-) This is what I send: (fist byte at ...0934, last byte at ...0975 )

20000930   42 00 80 00 01 00 00 00   89 02 00 00 C0 0B 00 05   B....... ........
20000940   00 00 00 00 00 14 00 68   6F 6D 65 2F 62 61 73 65   .......h ome/base
20000950   6D 65 6E 74 2F 77 61 73   68 65 72 02 00 6F 6E 00   ment/was her..on.
20000960   00 02 00 02 00 00 00 01   00 00 00 00 00 01 00 00   ........ ........
20000970   00 00 00 C1 7E C0 00 00   00 00 00 00 00 00 00 00   ....~... ........

ESP-link debug log: As we see, there is another MQTT pub, starting at 860810 sending the esp-link status

855163> SLIP: start or end len=9 inpkt=0
855164> SLIP: start or end len=56 inpkt=1
855164> cmdParsePacket: cmd=11 argc=5 value=0
855164> cmdExec: Dispatching cmd=MQTT_PUB
855164> MQTT: MQTTCMD_Publish topic=home/basement/washer, data_len=2, qos=0, retain=0
855164> MQTT: Publish, topic: "home/basement/washer", length: 26
855165> MQTT: Send type=PUBLISH id=0000 len=26
860810> MQTT: Publish, topic: "esp-link", length: 45
860810> MQTT: Send type=PUBLISH id=00A9 len=45
860813> MQTT: Recv type=PUBACK id=04A9 len=4; Pend type=PUBLISH id=4A9

and this is my console running :~$ mosquitto_sub -h 192.168.X.Y -v -t '#' both my test string and the esp-link status:

home/basement/washer on
esp-link {"rssi":-60, "heap_free":20008}
wolfgangr commented 5 years ago

@tve

so it's not documented, grr..

That's a common issue in the FOSS world. I respect that if people give away their work for free, they don't feel obliged to do boring stuff ;-)

So let me add my 3 cents here. (Or should I better start a Wiki page ... ?)

A packet consists of a 16-bit command followed by a 16-bit argument count and a 32-bit "callback value" (really an opaque token). This is all little endian. Your best bet is probably to start with the SYNC command whose code is 0x0001 and which takes one arg in the value field, which must be non-zero, e.g., { 0xC0, 0x01, 0x00, 0x01, 0x00, 0x11, 0x11, 0x11, 0x11, crc-lo, crc-hi, 0xC0 }

so this breaks up as follows?

      END      0xC0, 
      SYNC command 0x0001  ~~>  in little endian = 0x01, 0x00, 
      arbitrary non-zero dummy arg  0x0001 ~~>  0x01, 0x00, 
      arbitrary dummy callback value   0x11, 0x11, 0x11, 0x11, 
      checksum         crc-lo, crc-hi, 
      END       0xC0

You should get a response back { 0xC0, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, crc-lo, crc-hi, 0xC0 }

Have not yet screwed my forth interpreter to process responses coming along the same line I use to talk to it. But should not be that difficult, I expect. What I see at the console is a which I can expand to 0xEF at my forth console Presumably from the checksum?

When I process the answer, is it ok to consider anything enclose between two C0 as SLIP (well, if not excaped...) Can I rely that any SLIP rests on one single line? Or do line breaks loose their meaning between C0 ?

Note that any 0xC0 or 0xDB value needs to be escaped on the wire and is unescaped before calculating the checksum.

Good to know. So this should go into the routine which assembles the command string. If I did it in the definition of literals, it would be in the checksum and thus screw it. I found $DB constant SLIP_ESC compiled from my inherited template. So this is the escape character? Is this the correct way to do it then?

    0xDB -> 0xDB 0xDB
    0xC0 -> 0xDB 0xC0

And I suppose I have to remove it from responses as well, OK?

I'd like to keep things simple. Can I avoid nonprintble characters alltogehter?

wolfgangr commented 5 years ago

OK, next let's dissect my first succesful washer test string.

This is how it looks in memory:

20000930   42 00 80 00 01 00 00 00   89 02 00 00 C0 0B 00 05   B....... ........
20000940   00 00 00 00 00 14 00 68   6F 6D 65 2F 62 61 73 65   .......h ome/base
20000950   6D 65 6E 74 2F 77 61 73   68 65 72 02 00 6F 6E 00   ment/was her..on.
20000960   00 02 00 02 00 00 00 01   00 00 00 00 00 01 00 00   ........ ........
20000970   00 00 00 C1 7E C0 00 00   00 00 00 00 00 00 00 00   ....~... ........

Ouh, should I push my git to github to refer to the code that did this? anyway...

My code template says 'Command to sync up the esp-link ' 01 00 00 00 89 02 00 00 and calls it 'sync' Is it the sync you referred to? It's not enclosed in "END", and there is no CRC? Or is it just bullshit an silently ignored?

C0 according to wikipedia article in SLIP stands for 'frame end'. Is this required to start the following MQTT, or to close the (maybe broken) sync before?

0B 00 05 00 00 00 00 00 is called mqtt.pramble in my template, commented as '5 arguments, no callback'. So expanding your explanation, this is acutally a ESP-link SLIP command - to switch it to MQTT mode now - not part of the MQTT standard - correct? I see 0x000B as command an 0x0005 as argument(s?). Is it an argument, or a counter of such? If an argument, what's the meaning? If a counter, what is counted then?

Next is a string literal, consisting of a 16-bit length 0x0014 = dec 20 and a 20 byte long home/basement/washer no frills ascii string. I think the slashes refer to MQTT subtopics, but the mosquitto manual is still on my to-be-read list.

14 00 
68 6F 6D 65 2F 62 61 73 65 6D 65 6E 74 2F 77 61 73 68 65 72  

The next one 02 00 6F 6E 00 00 02 00 02 00 00 00 I just copied from the template, which I paste here including the comments there:

: message.on 
    02 00  6f 6e  00 00  \ "on"
    02 00  02 00  00 00  \ len of data  
;

There is another one (not used in my first example) I see the string format similiar to above Is there 16-bit padding required by the MQTT protocol? And I see the 3rd bit in the second line increased - copy of the string len?

: message.off 
    03 00  6f 66  66 00  \ "off"
    02 00  03 00  00 00  \ len of data  
;

The last one I simply copied without any clue. Well, I can google MQTT & quos & retain, of course 01 00 00 00 00 00 01 00 00 00 00 00

: qos.and.retain 
    01 00  00 00  00 00  \ qos = 0
    01 00  00 00  00 00  \ retain = 0
;

I really would appreciate some simple manual on the standard. The standard itself in my eyes is just a 100 page collection of might and could. Where can I find answers such as regarding the assembly of those strings?

C1 7E is the CRC checsum (0x7ec1) calculated as discussed above, in little endian notation C0 is the frame end token again.

wolfgangr commented 5 years ago

Ouh, should I push my git to github to refer to the code that did this? anyway...

voila: https://github.com/wolfgangr/forthMQTT3pktControl

preliminary hack of a buffered string library https://github.com/wolfgangr/forthMQTT3pktControl/blob/master/stringbuf.fs

the constants I referred to: https://github.com/wolfgangr/forthMQTT3pktControl/blob/master/mmq-const.fs

the hot spot - work in progress while I write in this issue https://github.com/wolfgangr/forthMQTT3pktControl/blob/master/mmq-tools.fs

uzi18 commented 5 years ago

@wolfgangr nice work, but why FORTH? @tve any hints for this implementation?

wolfgangr commented 5 years ago

why FORTH?

FORTH provides - to my knowledge - the largest spread between extremes on a couple of dimensions:

So on this question you could blow every thread, and there are enough of these discussions around.

wolfgangr commented 5 years ago

back to start again? - Well, hitting a different wall, now. Wrote some words to customize contents (topic, value) Looks basicaly the same as the working one above - just different string lengts.

20000BA0   70 47 00 00 44 00 80 00   01 00 00 00 89 02 00 00   pG..D... ........
20000BB0   C0 0B 00 05 00 00 00 00   00 15 00 68 65 61 74 69   ........ ...heati
20000BC0   6E 67 2F 70 72 65 73 73   75 72 65 2F 70 75 6D 70   ng/press ure/pump
20000BD0   00 02 00 6F 6E 00 00 02   00 02 00 00 00 01 00 00   ...on... ........
20000BE0   00 00 00 01 00 00 00 00   00 DC AE C0 00 00 00 00   ........ ........
20000BF0   00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00   ........ ........

But does not work. No message received on the MQTT client. The ESP-Link console:

804117> SLIP: start or end len=8 inpkt=0
804117> SLIP: start or end len=58 inpkt=1
804117> cmdParsePacket: cmd=11 argc=5 value=0
804117> cmdExec: Dispatching cmd=MQTT_PUB
804118> E:M 28280

So obviously the command got processed by the SLIP layer and handed over to the MQTT_PUP layer. But then it threw some kind of an error E:M 28280 looks like a long night of RTFS....

wolfgangr commented 5 years ago

For reference, again the log in the case of success

855164> cmdParsePacket: cmd=11 argc=5 value=0
855164> cmdExec: Dispatching cmd=MQTT_PUB
MQTT: MQTTCMD_Publish topic=home/basement/washer, data_len=2, qos=0, retain=0

... and of current fail:

804117> cmdParsePacket: cmd=11 argc=5 value=0
804117> cmdExec: Dispatching cmd=MQTT_PUB
804118> E:M 28280

we have a https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c line 100:

void ICACHE_FLASH_ATTR
MQTTCMD_Publish(CmdPacket *cmd) {

line 140:

  DBG("MQTT: MQTTCMD_Publish topic=%s, data_len=%d, qos=%d, retain=%d\n",
    topic, data_len, qos, retain);

MQTT_Publish(client, (char*)topic, (char*)data, data_len, qos%3, retain&1);

So I think this stanza generates the successful debug message and just after calls the handover to the MQTT server.

However, I can't see where the fail message E:M 28280 is assembled. Looks I have to search one wrapping layer above.

wolfgangr commented 5 years ago

@tve

Sadly I inherited a lot of that low-level stuff from some other project, so it's not documented,

Is there any hope that this other project has some docu we could build upon?

I see this (C) https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_msg.h

Copyright (c) 2014, Stephen Robinson

but only referring the outgoing calls (ESP-link -> MQTT-Server) No (C) in the serial parsers.

--- edit --- https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt.c

*  Protocol: http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html
* Copyright (c) 2014-2015, Tuan PM <tuanpm at live dot com>
* Modified by Thorsten von Eicken to make it fully callback based

OK ? - I found the correct standard aka haystack and just heve to look for the needle...

wolfgangr commented 5 years ago

https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c contains some checks can't identify the current point of failure, but keep them in the log as relevant details for the whole picture:

105: if (cmdGetArgc(&req) != 5) return; So the cmd-len param has always to be 5, no need to implement variations on that

113: if (len > 128) return; / max size of topic + data + decoration, we may match this limit at the client size

120: uint8_t *data = (uint8_t*)os_zalloc(len+1); .... if (!data) may I refer to that as 'out of memory error'

my ESP-link keeps sending mqtt state messages like esp-link {"rssi":-56, "heap_free":20384} So I can't see a reason why we can' allocate 128 bytes in 20 k ram yet.

I see no premature returns after this last check and the succesful debug message. We find repeated calls to cmdPopArg( referring to some of the parameters displayed in the DBG("MQTT: MQTTCMD_Publish Is there some backdoor error handler in the cmdPopArg that might override out flow of command?

wolfgangr commented 5 years ago

Is there some backdoor error handler in the cmdPopArg that might override out flow of command?

// Copy next arg from request into the data pointer, returns 0 on success, -1 on error https://github.com/jeelabs/esp-link/blob/fe4f565fe83f05e402cc8d8ca3ceefbc39692a1f/cmd/cmd.h#L120 https://github.com/jeelabs/esp-link/blob/fe4f565fe83f05e402cc8d8ca3ceefbc39692a1f/cmd/cmd.c#L159

I don't see such backdoor. There is a os_memcpy but I suppose this cannot fail on allocated buffers. It just returns 0 (success) or -1 (not enough args?? ) , so has to be checked by the caller

wolfgangr commented 5 years ago

https://github.com/jeelabs/el-client

A prelimenary documentation for the library is available on ELClient API Doc.

404: Page not found The way back machine just shows an empty frame.

It refers to doygen. So the docu lives in here? https://github.com/jeelabs/el-client/blob/master/ELClient/ELClientMqtt.cpp

hm... a first glance tells me that is a layer beyond what I'm looking for. I don't want to call those C-functions, but mimic their behaviour (maybe a tiny subset thereof)

@tve I think this were right what I try to reinvent here. Are you aware of any other copy of this prelim API? Maybe somewhere on your personal storage?

wolfgangr commented 5 years ago

OK - hope disappointed - back to track.

who is printing E:M ? github says:

We couldn’t find any code matching 'E:M' in jeelabs/esp-link

who is calling MQTTCMD_Publish ? https://github.com/jeelabs/esp-link/blob/fe4f565fe83f05e402cc8d8ca3ceefbc39692a1f/cmd/handlers.c#L74

in Version 2.2.3 this line sits at 41: - 33 lines less Looks like a major rewrite. Does this relate to my question? Is it OK to continue my work on stable V2.2.3 ?

Thorsten, @tve could you please provide a quick answer on that?

'==============

this is a // Command dispatch table. So, who is processing that?

line 76: (in v2.2.3) : cmdGetCbByName(char* name) which I read as get Callback by Name no other hit in git seach for MQTT_PUB ... hm ... puzzling? well, you don't call by strings within C, do you?

we can search for the integer index CMD_MQTT_PUBLISH on our command table as well .... and find.... https://github.com/jeelabs/esp-link/blob/fe4f565fe83f05e402cc8d8ca3ceefbc39692a1f/cmd/cmd.h#L49

wolfgangr commented 5 years ago

https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/cmd.h distill the context:

typedef enum {
       .....
  CMD_MQTT_PUBLISH, // publish a message
       .....
} CmdName;
wolfgangr commented 5 years ago

https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/cmd.c#L83

static void ICACHE_FLASH_ATTR
cmdExec(const CmdList *scp, CmdPacket *packet) {
  // Iterate through the command table and call the appropriate function
  while (scp->sc_function != NULL) {
    if(scp->sc_name == packet->cmd) {
      DBG("cmdExec: Dispatching cmd=%s\n", scp->sc_text);
      // call command function
      scp->sc_function(packet);
      return;
    }
    scp++;
  }
  DBG("cmdExec: cmd=%d not found\n", packet->cmd);
}

Let' recall our log:

804117> cmdExec: Dispatching cmd=MQTT_PUB
804118> E:M 28280

OK, so we have found the point BEFORE the error encurred. further enquire scp->sc_function(packet); we have the function argument cmdExec(const CmdList *scp, ... defined here https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/cmd.h#L55

typedef struct {
  CmdName   sc_name;     // name as CmdName enum
  char      *sc_text;    // name as string
  cmdfunc_t sc_function; // pointer to function
} CmdList;

hm .... Looping closed and overlooked the point?

wolfgangr commented 5 years ago

Looping closed and overlooked the point?

OK, down again the rabbit hole and collect the crumbs on the way. https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/handlers.c#L41 {CMD_MQTT_PUBLISH, "MQTT_PUB", MQTTCMD_Publish},

https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/cmd.c#L89

      DBG("cmdExec: Dispatching cmd=%s\n", scp->sc_text);
      // call command function
      scp->sc_function(packet);

This resembles our last valid debug print, so we know for sure (+-) that MQTTCMD_Publish is called on our packet.

So somewhere between https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c#L101 MQTTCMD_Publish(CmdPacket *cmd) { and ( the debug print in success case we are missing here) https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c#L140 DBG("MQTT: MQTTCMD_Publish topic=%s, data_len=%d, qos=%d, retain=%d\n", our problem must occur.

uzi18 commented 5 years ago

E:M means out of memory for alloc

uzi18 commented 5 years ago

need to find some readings on forth for uC. any link?

wolfgangr commented 5 years ago

E:M means out of memory for alloc

thanks :-) so I look for alloc and double check the validity of its arguments?

readings on forth for uC. any link?

Many of them. start here: https://github.com/wolfgangr/forthMQTT3pktControl/tree/master Mecrisp I think is a good start, if your platform can handle it. Smaller µC may be limited to sth like eforth, but I haven't tried https://hackaday.com/2017/01/27/forth-the-hackers-language/ https://hackaday.io/project/16097-eforth-for-cheap-stm8s-gadgets

wolfgangr commented 5 years ago

https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/cmd.h#L27

typedef struct {
  CmdPacket *cmd;     // command packet header
  uint32_t  arg_num;  // number of args parsed
  uint8_t   *arg_ptr; // pointer to ??
} CmdRequest;

just assignemnts, no alloc: cmdRequest(&req, cmd); if (cmdGetArgc(&req) != 5) return; len = cmdArgLen(&req);

can this alloc fail?

  if (len > 128) return; // safety check
  uint8_t* topic = (uint8_t*)os_zalloc(len + 1);

Memory completely full? len is uint16_t, so there shold not be a problem with negative numbers.

cmdPopArg does not alloc either - just os_memcpy https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/cmd.c#L157

So we conlude that alloc fails on 128 bytes. Which means that maybe not our current command is the culprit, but some nasty stuff that happened earlier? Or a bug in ESP-Link? May I try a later version?

wolfgangr commented 5 years ago

OK, lets compare the debug messages again: I see in the faulty case the first len=9 instead of len=8

855163> SLIP: start or end len=9 inpkt=0
855164> SLIP: start or end len=56 inpkt=1
855164> cmdParsePacket: cmd=11 argc=5 value=0
855164> cmdExec: Dispatching cmd=MQTT_PUB
855164> MQTT: MQTTCMD_Publish topic=home/basement/washer, data_len=2, qos=0, retain=0
804117> SLIP: start or end len=8 inpkt=0
804117> SLIP: start or end len=58 inpkt=1
804117> cmdParsePacket: cmd=11 argc=5 value=0
804117> cmdExec: Dispatching cmd=MQTT_PUB
804118> E:M 28280

Does this refer to the sync part? This is weird, since i send the same 8 bytes, closed by 0xC0, in both cases.

uzi18 commented 5 years ago

why do you reffer to v2.2.3, as we have 3.2.2x available?

wolfgangr commented 5 years ago

The len issue does not appear to be reproducible.

I cant't see any trace of a memory problem in the status message:

esp-link {"rssi":-59, "heap_free":19824}
home/basement/washer off
                      [ ....  unsuccesful try with E:M message in Log  ..... ]
esp-link {"rssi":-62, "heap_free":19824}

lets compare the test strings byte by byte (top = good, bottom = bad)

 01 00 00 00 89 02 00 00 C0 0B 00 05 00 00 00 00 00 14 00 68 6F 6D 65 2F 62 61 73 65 6D 65 6E 74 2F 77 61 73 68 65 72       03 00 6F 66 66 00 02 00 03 00 00 00 01 00 00 00 00 00 01 00 00 00 00 00 8F 9E C0
 01 00 00 00 89 02 00 00 C0 0B 00 05 00 00 00 00 00 16 00 68 65 61 74 69 6E 67 2F 70 72 65 73 73 75 72 65 2F 76 61 6C 76 65 03 00 6F 66 66 00 02 00 03 00 00 00 01 00 00 00 00 00 01 00 00 00 00 00 53 5F C0

CONFUSED - go for a beer...

uzi18 commented 5 years ago

ok but first is :

home/basement/washer
heating/pressure/valve
wolfgangr commented 5 years ago

is there another string length dependent value in the prefix? whab about the funny 89 in the 5th byte?

do we have to pad the topic to 4 byte,not 2?

home/basement/washer off
home/basement/wasABCDher off

that' what it looks like...

uzi18 commented 5 years ago
typedef struct PACKED {
  uint16_t cmd;            /**< Command to execute */
  uint16_t argc;           /**< Number of arguments */
  uint32_t value;          /**< Callback to invoke, NULL if none; or response value */
  uint8_t  args[0];        /**< Arguments */
} ELClientPacket; /**< Packet structure  */

#define SLIP_END      0300    /**< Indicates end of packet */          // 0xC0
#define SLIP_ESC      0333    /**< Indicates byte stuffing */          // 0xDB
#define SLIP_ESC_END  0334    /**< ESC ESC_END means END data byte */  // 0xDC
#define SLIP_ESC_ESC 0335 /**< ESC ESC_ESC means ESC data byte */      // 0xDD
wolfgangr commented 5 years ago

@tve

uzi18 commented 5 years ago

@wolfgangr first try to use ELClient library with debug on, to dig into Mqtt implementation

uzi18 commented 5 years ago

it looks like 01 00 00 00 89 02 00 00 is not a slip pkt or malformed one

uzi18 commented 5 years ago

pkt:

C0 0B 00 05 00 00 00 00 00 14 00 
68 6F 6D 65 2F 62 61 73 65 6D 65 6E 74 2F 77 61 73 68 65 72 03 00 
6F 66 66 00 02 00 03 00 00 00 01 00 00 00 00 00 01 00 00 00 00 00 8F 9E 
C0

C0 //slip start
0B 00 //CMD_MQTT_PUBLISH
05 00 // 5 args
00 00 00 00  // callback - none
14 00 // 1st = topic length
68 6F 6D 65 2F 62 61 73 65 6D 65 6E 74 2F 77 61 73 68 65 72 03 00 //  topic = home/basemen /washer
6F 66 66 00 // 2nd off // here should be 03 00  ?????
02 00 // 
03 00 // 3rd ??
00 00 
01 00 // 4th qos
00 00 // 0
00 00 
01 00 // 5th retain
00 00 // 0 
00 00 
8F 9E //crc
C0 //end
uzi18 commented 5 years ago

something is wrong, arg are like: length, arg

wolfgangr commented 5 years ago

Changed my code to 32-bit padding of topic and can successfully send those topic strings:

home/basement/washer off
heating/pressure/valve off
home/basement/wasABCDher125 off
home/basement/wasABCDher125 off
home/basement/wasABCDher125XY off

I suspect that this zalloc https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c#L163 uint8_t* topic = (uint8_t*)os_zalloc(len + 1); fails if it gets arguments that are not multiples of four

uzi18 commented 5 years ago

So somewhere between https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c#L101 MQTTCMD_Publish(CmdPacket *cmd) { and ( the debug print in success case we are missing here) https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c#L140 DBG("MQTT: MQTTCMD_Publish topic=%s, data_len=%d, qos=%d, retain=%d\n", our problem must occur.

just comment this line for tests.

uzi18 commented 5 years ago

you can change it: zalloc -> alloc + memset,0

wolfgangr commented 5 years ago

So let's proceed to the rest of the MQTT message, which at the moment reads

03 00 6F 66 66 00 02 00 03 00 00 00 01 00 00 00 00 00 01 00 00 00 00 00

Educated guess recommends it to split this into 4 equal chunks: Since we have

uzi18 commented 5 years ago

for me it looks like a bug, as before argument ("off") should be data_len (3) @tve could you explain?

https://github.com/jeelabs/el-client/blob/master/ELClient/ELClientMqtt.cpp#L173

wolfgangr commented 5 years ago

.... some decent MQTT documentation.

When this is the low level of an MQTT packet http://www.steves-internet-guide.com/mqtt-protocol-messages-overview/ this does not ring many bells here.

So we have to assume that our SLIP-enclosed protocol is physically quite different from MQTT and translated in ESP-Link, implementing some kind of OSI 7 layer "MQTT-proxy"

I'd roughly guess from what I'v seen 'till now:

wolfgangr commented 5 years ago

Let's collect pointers for reference:

retained refers to as single flag that tells the server to store a message and send it to new clients immediately. I hope a simple 0 | 1 will do the job. If not, we have to RTFS.

There are three levels of quality of service:

So, here, too, let's try if a simple integer byte value works.

Ah - let's have a look at the result of the parsing endeavour: https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c#L143 MQTT_Publish(client, (char*)topic, (char*)data, data_len, qos%3, retain&1);

The last three values are processed by cmdPopArg