openshwprojects / OpenBK7231T_App

Open source firmware (Tasmota/Esphome replacement) for BK7231T, BK7231N, BL2028N, T34, XR809, W800/W801, W600/W601 and BL602
https://openbekeniot.github.io/webapp/devicesList.html
1.34k stars 228 forks source link

Enhancement: add support for W600 #222

Closed iprak closed 1 year ago

iprak commented 1 year ago

W600 is the older and slower brother of W800 from the same company Winner Micro. It bears resemblance to W800 but has its own SDK. I have got it somewhat working but have to resolve some crashes probably related to older SDK.

@openshwprojects would this be something which could be merged into this repo ? Have you received any request for W600? Would it complicate the overall build?

openshwprojects commented 1 year ago

I have W600 board, can you share your progress so I can look into it?

EDIT: W600 board from Aliexpress, the dev one.

Do W600 also have an IDE with editor and compiler built in like W800 or how does one compile for W600?

iprak commented 1 year ago

Well since now we have 2 potential users so my effort might be of use. :-)

I dismantled a switch hoping to find BK but found W600. There is a separate older toolchain which I have been using primarily from cygwin.

I based my effort on the W800 repo. I am not certain but I think that the heapsize increase and configUSE_HEAP2 usage are not playing well, this is a much older sdk.

I can submit a PR this week once I have verified the issues. I will also double check the linux toolchain.

openshwprojects commented 1 year ago

Your efforts are of course not in vain, you already contributed a lot of to OpenBeken, you're one of our best contributors.

First of all, what do you mean by "older SDK", do you mean that there is a newer version for W600 or are you calling W600 "older" and W800 "newer"?

What is exactly crashing? I had some crash issues on BL602 platform and I suspect that all those chip toolchains might have LWIP library not prepared for threading, thus causing random crashes, but I am not sure right now. Remember that there is also a BK7231N stability issue reported by some users.

The HEAP2 should not be problematic if we are not doing a malloc too often.

One of things I've considered doing is removing mallocs per HTTP request from here: https://github.com/openshwprojects/OpenBK7231T_App/blob/main/src/httpserver/http_tcp_server.c

_reply = (char) osmalloc( replyBufferSize ); _buf = (char) os_malloc( INCOMING_BUFFERSIZE );

the thing is, if I try to process one client at once (without creating threads at all, in a blocking manner) BK web page is not responsive and very sluggish.

Still, there is another potential approach to this problem.

Instead of doing malloc per each HTTP request and doing free, one could do something like that:

struct buffers { char a, b; bool bInUse; buffers next; };

  1. when HTTP request is added: buffers *HTTP_RegisterBuffers( ) ; this would find empty buffers or alloc new ones
  2. when HTTP request is processed HTTP_MarkBufferesAsFree(buffers*) this would just set bInUse = false so they are reused

this will remove all malloc/free calls, of course one would have to limit the total number of buffers allocated at once

I might do this change tomorrow, it's very simple to do and implement, and this might help systems that don't like frequent malloc/free

valeklubomir commented 1 year ago

Frequent use of malloc/free operations are pretty safe, unless bugs in code write data outside of allocated space and overwrite control data area of allocated memory blocks. Which is in 99% of cases the cause of crash, when system is using malloc/free.

Especially example of HTTP server, when constructing response, which is text string, where reserve must be considered. printing number may differ in length of text string. Especially float, 32-bit decimal,...... Creating new approach may result in same issues unless bugs are completely fixed.

openshwprojects commented 1 year ago

@valeklubomir are you aware about W800 and possibly W600 limitation of older RTOS using older Heap management (free/malloc) algorithm?

https://cdmana.com/2022/03/202203310754244532.html

W801 Of SDK Is added by default heap.2 Algorithm , The algorithm has no defragmentation function , It is not suitable for applications that need to frequently apply for and release memory of different sizes , For example, the player made by the author , Different decoding algorithms will be selected due to different formats, and different sizes of memory will be applied , In actual use, it will be unable to run due to serious memory fragmentation and reduced utilization .

valeklubomir commented 1 year ago

@openshwprojects I apologize, I may overstepped little. Sorry for that, I am really not familiar W600, W800. And I am not aware of this limitation.

My concerns were about my current work with device with BK7231N and how these changes would impact it. Because I am trying to resolve issues with freezing on my device. But your idea with buffers for HTTP brought me on some track, I experienced similar behavior on ESP32 device ESP-IDF SDK, which uses also FreeRTOS and LWIP, and HTTP server caused crashes which I later traced to buffer capacity problems. I resolved it with larger buffer, especially functionality which increased size of the buffer when risk of overflow was detected. I rejected splitting response into multiple transmission on HTTP level, because then web page was not responsive and very sluggish. Option when buffer was sent at once and it was splited on TCP level, kept the HTTP with good response rates.

openshwprojects commented 1 year ago

Because I am trying to resolve issues with freezing on my device.

does it happen with MQTT disabled?

I have a slight suspicion that it may related to multithreading, but not sure, because it happens on N SDK for people and not on T. Anyway, here are my two ideas what can be done to fix N platform stability:

  1. compare BK7231N SDK step by step to BK7231T SDK (#define etc etc). I tried to do that in the past, but no luck so far
  2. maybe, if it's really threading issue, then one could try updating LWIP to more safe version and add mutexes there? See: https://www.nongnu.org/lwip/2_1_x/multithreading.html our lwip is 2.0.2 if I remember correctly, so it has no LWIP_ASSERT_CORE_LOCKED . Maybe one could try updating LWIP..

@valeklubomir do you know C, would you be able to help?

valeklubomir commented 1 year ago

@openshwprojects I did not try it with MQTT disabled. Will let it run overnight to check. I am working on it since 5 days, private project. At beginning I experienced freezing directly after reboot, due to wrong configuration setup. And since only safe mode did not freeze, deleted configuration and restarted the process without any more issues.

I know C and I am trying to fix it.

I can try to compare SDK. But It could be helpfull to know differences between BK7231N and T. Is there any datasheet available? I had not luck finding one till now. I could compare LWIP and try upgrade to latest version.

At moment I work on MQTT to improve stability and handing error states.

openshwprojects commented 1 year ago

@valeklubomir I remember that doing a MQTT publish every second or several publishes of MQTT every second tends to trigger the issue more often. I've tried to debug it, but no luck. I also tried increasing the size of buffers in LWIP for TCP sockets, the PBuf size, etc, but the issue still persists. I don't think it is related to the actual size of LWIP buffers, I'd rather say it's related to threading or something. Or maybe I just missed something.

No datasheet as far as I know.

The strange thing is that it happens on N platform and not on T, while LWIP is the same on both platforms, soo maybe my suspicion about threading and LWIP is wrong.

iprak commented 1 year ago

I think the discussion about MQTT and buffer might belong to a different thread. :-)

Any way here is my status update on W600 -> I am currently preparing a PR.

At one point I was getting errors like this on startup but not any more.. I did a full erase of the device using a tool which came with SDK and maybe that cleared up some setting from older firmware. I was unable to read back the stock firmware before I started this experimentation.

Current Stack [0x2002a718, 0x2002aa38) is NOT in VALID STACK range [0x20000000,0x20028000)
Please refer to APIs' manul and modify task stack position!!!

Current Stack [0x2002b710, 0x2002bbc0) is NOT in VALID STACK range [0x20000000,0x20028000)
Please refer to APIs' manul and modify task stack position!!!
Info:MAIN:Time 12844, free 30296, MQTT 1, bWifi 1, secondsWithNoPing -1, socks 2/8
Info:MAIN:Time 12845, free 30296, MQTT 1, bWifi 1, secondsWithNoPing -1, socks 2/8
Info:MAIN:Time 12846, free 30296, MQTT 1, bWifi 1, secondsWithNoPing -1, socks 2/8
Info:MAIN:Time 12847, free 30296, MQTT 1, bWifi 1, secondsWithNoPing -1, socks 2/8

Issues:

apsta_demo_net_status: sta ip: 192.168.1.111

apsta_demo_net_status: softap ip: 192.168.4.1 Info:MAIN:Time 6, free 3344, MQTT 0, bWifi 0, secondsWithNoPing 0, socks 3/8 Info:MAIN:Time 7, free 3344, MQTT 0, bWifi 0, secondsWithNoPing 0, socks 3/8

valeklubomir commented 1 year ago

I will post my finding here. https://github.com/openshwprojects/OpenBK7231T_App/issues/204

openshwprojects commented 1 year ago

@iprak great progress, I will try to find where I have put my W600 dev board in a meantime, I will help with testing when your port is released Can you also do a detailed write up how to compile, and also, is there also an IDE for W600? Or just a Cygwin prompt

@iprak I'm ready

20220929_164301518_iOS

iprak commented 1 year ago

That is a nice dongle and it has a reset button.

I pushed my changes in https://github.com/openshwprojects/OpenBK7231T_App/pull/229

I was unable to get it compiling all the way in linux. The very last step of generating fls file is done through wm_tool and the sdk only contains wm-tool.exe.

I did find another sdk at https://docs.wiznet.io/Product/Wi-Fi-Module/WizFi360/Other-Resource/w600_sdk which contains python based image generation.

openshwprojects commented 1 year ago

@iprak regarding the USER_SW_VER. The default value of USER_SW_VER is set in code, right, but the correct value for online builds should be set in build scripts. Refer to already supported platforms like BK7231 to see how it's set. It also seems that the setting of USER_SW_VER is missing for BL602 as well. I will try to build for W600 tomorrow.

iprak commented 1 year ago

You are absolutely right. I adjusted the W600 sdk to accept version as something like this make -C OpenBK7231T_App/sdk/OpenW600 TOOL_CHAIN_PATH=/workspaces/OpenBK7231T_Dev/w600-gcc-arm/bin/ APP_VERSION=1.2.3.

I poked at W800 SDK and could not figure out if/how it passed down the version. I saw this in the action log but that's how far I got. I don't have a W800 device to experiment with.

2022-10-01T18:16:42.5557137Z ##[group]Run make APP_VERSION=1.12.67 APP_NAME=OpenW800 OpenW800
2022-10-01T18:16:42.5557474Z make APP_VERSION=1.12.67 APP_NAME=OpenW800 OpenW800
2022-10-01T18:16:42.5609260Z shell: /usr/bin/bash -e {0}
openshwprojects commented 1 year ago

W800 is just 4$, I can buy you one from eBay if you want, you'd just need to message me on Elektroda, of course eBay would need to ship to your country image

I will look into W600 today, remind me if I forget, right now I am adding an option to cancel repeating events, I will add this to main source tree in few hours

iprak commented 1 year ago

Thank :-) I got one.

openshwprojects commented 1 year ago

@iprak can you temporarily look into HA discovery for RGBCW lights? I have added a valid config generation: https://github.com/openshwprojects/OpenBK7231T_App/commit/2284d0ec97c428eb5422b627211c680f81ff8392 but discovery still discovers my RGBCW bulb as 5 PWMs

iprak commented 1 year ago

That would be expected, there is currently no support for color lights. I was working on adding support for voltage and can look into that next.

openshwprojects commented 1 year ago

Didn't have time to play with W800/W600 yet. What is the current state of things, @iprak ? Is there anything you need help with? In a meantime, we're fixing the N stability and some LED stuff

iprak commented 1 year ago

I have made good progress. Through trail/error and logging, I was able to isolate the potential root causes - storage overflow and missing NULL check. This was in JSON generation/cleanup and was not associated with MQTT publish.

I did not use the latest lwip but did increase MQTT_OUTPUT_RINGBUF_SIZE, etc. which would have eventually caused MQTT publish to fail. I am also switching to fixed size storage array since I am suspicious about the memory business in W600. I am going to let my device run today with status broadcast every minute and then push out changes.

Also testing the HASS changes in a T device.

openshwprojects commented 1 year ago

Good job. @iprak , is w600 OTA ready? I have W600 RGBCW bulb! W600 OTA would help a lot and I'd be able to test more, especially that I need to get that one running.

iprak commented 1 year ago

Oh yes, that is what I have been using. The OTA however is only implemented for W600/W800 in the app served by the device and not by OpenBekenIOT/webapp.

openshwprojects commented 1 year ago

@iprak wow that's great, i will really try to setup the SDK for compilation and OTA of W600, if there is anything else I need to know please tell me now, I have just a single W600 bulb and I don't want to brick it. I will test with W600 dev board first.

iprak commented 1 year ago

How will you flash the bulb?

I have been working with this chip. I wired the serial pins and was able to flash it. I did have to use full erase the very first time with the UART (fls) image.

image

For OTA update, I have been using the gz.img

openshwprojects commented 1 year ago

I will most likely use this method for flashing bulb: https://www.youtube.com/watch?v=7MyfSgxLAOo&ab_channel=elektroda.pl Btw, are the english subtitles working for my video?

Very nice images, remember to post this as a teardown to Elektroda!

Look: image testing begins

iprak commented 1 year ago

Yes subtitles are excellent.

openshwprojects commented 1 year ago

I connected to this AP some time and after opening the main page I had: image and it froze when I clicked "Config". But now I see that this small WiFi dongle I am using (see previous post for image) is hot, so maybe it's overheating... I will give it 15 minutes to cool down and try again.

EDIT: huh, it didn't disconnect the socket at first? image It disconnected socket after only few seconds... image

buuut now it works ok... so maybe it was a fluke.... thisUSB dongle W600 Chinese module seems very cheap, and it gets untouchable hot in my notebook USB port after a minute of running.

iprak commented 1 year ago

I saw something similar at the beginning when I was playing with the SDK demo app, the t-scan test would give nothing. No errors but the access-point test would passed and I was able to connect to it. So I couldn't blame it on bad radio on the chip.

Anyway, I then the full erase option with wm_tools and everything started working.

I had the chip connected to 2 separate dongles, one for 3.3 and 2nd for serial. I am going to let it run and then check the temperature.

I have suspicion of something wrong but can't find evidence. I tend to make a note of how long has the chip been running and don't think the "Online for ..." is correct. That timing is based on every second tick and it feels to drift over time. I have not enabled NTP yet and plan to use that to get some data. If the chip gets busy doing something and gets hot, then that would explain the drifted "Online for ...".

openshwprojects commented 1 year ago

@iprak I think we might have a long-ongoing issue with LWIP. It was fixed on N SDK by one contributor, but the same problem was in BL602 and still is not fixed. I don't know about W600 yet.

This little dongle gets really, really hot. I will try to test more, but maybe I will really find some kind of radiator and just stick it with thermal paste to it so it can run longer.

Why is total number of sockets trimmed? image EDIT: changed to WiFi client from OTA and now it's 2/8 displayed correctly? or wait, no, look: image buffer len trimming? image

Naming to fix

iprak commented 1 year ago

I feel there is something odd with logging. I asked about this on #240.

My W600 feels warm too (73 F). I am supplying power from a FTDI adapter and it has been consuming around 120mA.

openshwprojects commented 1 year ago

Does it still gets hot?

iprak commented 1 year ago

I just measured it, it is the same as before with same power consumption.

talltechdude commented 1 year ago

I've added the W600 platform to the auto-build and release config https://github.com/openshwprojects/OpenBK7231T_App/pull/281 ready for merge.

@iprak had already done 90% of the work getting the SDK to compile nicely with a makefile!