sidoh / esp8266_milight_hub

Replacement for a Milight/LimitlessLED hub hosted on an ESP8266
MIT License
945 stars 219 forks source link

Unable to send commands to RGBW bulbs if you havent send before commands to RGB-CCT bulbs #64

Closed g-rom-info closed 7 years ago

g-rom-info commented 7 years ago

Dear Chris,

I can reproduce an issue : after a cold start, or after several minutes, i cannot control my Milight RGBW bulbs. (as same as Domoticz and Bridge WebUI) However, when I send a command to RGB+CCT channel, I can send again commands to my RGBW bulbs.

Transceiver : PL1167

Do you have the same issue ?

Regards, Jerome

sidoh commented 7 years ago

I've noticed that the LT8900 is a little flakey before it's sent a few packets, but I seem to be able to switch between RF configs without too much trouble afterwards.

g-rom-info commented 7 years ago

Yesterday evening I've replaced my original Milight Bridge by my ESP8266 (same port, same protocol v5) I let it run all night. This morning, I was able to switch on a group of RGBW light, but after that it has not working anymore. After that, I've sent a command to my RGBCCT bulb and after that I can send again commands to my RGBW bulbs.

WoodsterDK commented 7 years ago

Havent seen this, try to always initialize with type == RGB_CCT in the initialize method by hardcoding the type instead of the config type. It uses captured settings from the remote.

sidoh commented 7 years ago

Just tested it. It definitely seems spottier for RGBW bulbs with a LT8900. I'm not sure that there's any pattern other than a much higher failure rate. Seems to work fine with RGB+CCT. The same ESP works well with an NRF.

Perhaps this means some of the settings for non-RGB+CCT have suboptimal register values? That's set up here:

https://github.com/sidoh/esp8266_milight_hub/blob/master/lib/MiLight/LT8900MiLightRadio.cpp#L95

Maybe we could try setting it up with the same ones used for RGB+CCT, or sniff what the remote is using.

WoodsterDK commented 7 years ago

That was what I meant☺

g-rom-info commented 7 years ago

Moreover, I also note that the ESP8266 sometimes loses its connection with the WiFi, as if the PL1167 module remained on and disturbed the WiFi signal. I have to disconnect the VCC power supply from the PL1167 to retrieve the IP link.

I have a CCT remote control and I have soldered wires in order to sniff the SPI communication. I use a Logic Analyzer based on Arduino Mega. I think it will be better to build directly an SPI sniffer with Arduino and send any captured packets by Serial Bus to a terminal but I havent sufficient dev skill...

sidoh commented 7 years ago

Are you able to watch serial output on your ESP? Mine stays connected for weeks at a time and I use it frequently throughout the day. I have noticed that the LT8900 will occasionally prevent the ESP from booting cleanly. I think probably it's setting GPIO 15 high (which is one of the SPI pins), which would cause the ESP to try to boot from flash. It'll also occasionally cause a WDT reset.

I don't have a logic analyzer, so unfortunately wouldn't be able to sniff the values myself.

g-rom-info commented 7 years ago

Yes I'm able to watch serial output: See below when I boot up.

*WM: AutoConnect
*WM: Connecting as wifi client...
*WM: Using last saved values, should be faster
*WM: Connection result:
*WM: 3
*WM: IP Address:
*WM: 192.168.18.12

On the same time, from my PC (french terminal):

Réponse de 192.168.16.1 : Impossible de joindre l’hôte de destination.
Réponse de 192.168.18.12 : octets=32 temps=2296 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=6 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=1 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=3 ms TTL=127
Délai d’attente de la demande dépassé.
Délai d’attente de la demande dépassé.
Délai d’attente de la demande dépassé.
Délai d’attente de la demande dépassé.

Ping comes up again when I unplug the PL1167. PKT is wired to D0 (GPIO16) and RST to D3 (GPIO0). 4 other pins to SPI BUS.

WoodsterDK commented 7 years ago

Regarding the non responding, perhaps it is related to timing issues. The interface may be too fast for the bulbs. Is it combined commands, or single commands that has problems?. If combining a power on with a level change then perhaps the level change is not seen by the bulb due to it being sent to fast. Until now i have used it with rgbcct bulbs and mainly with single commands. Here no problems have been seen.

g-rom-info commented 7 years ago

I think your assumption is good. Indeed it seems that the lamps behave badly especially when the bridge is sending multiple commands in sequence.

However I confirm that I havent any issue at this time with my CCT Bulb.

I installed a dev environment on Debian 8 + Atom + PlatformIO I can test to change "on the fly" the sleep time between 2 commands. Which file I need to modify ?

FYI : The PL1167 Module That I use

WoodsterDK commented 7 years ago

The file would be milightclient, and the method

void MiLightClient::update(const JsonObject& request)

Try to find a combination of commands that fail, end then as a test insert delays between those calls in the above mentioned method.

sidoh commented 7 years ago

Might make sense to put the delays in the LT8900 radio. I don't think the same thing happens with an NRF.

Thinking that'd be here? https://github.com/sidoh/esp8266_milight_hub/blob/master/lib/MiLight/LT8900MiLightRadio.cpp#L476

WoodsterDK commented 7 years ago

For implementation yes, just guided for a quick test.

g-rom-info commented 7 years ago

I tried to force RGB_CCT initialization on the constructor, no effect.. I tried to increase the delay on "bool LT8900MiLightRadio::sendPacket(uint8_t *data, size_t packetSize, byte byChannel)" function, no effect...

I tested some modifications on the source code but I havent sufficient skill in order to find why my ESP8266 loses this wifi connection after LT8900 connection. The lose of connectivity is random and I recover connectivity when I shut down the PL1167 by unpluging the VCC wire.

I activated debug log and I see only that

Radio module running correctly...
Radio module running correctly...
Radio module running correctly...
Radio module running correctly...

Is it normal to see it 4 times ?

Moreover, as explained below, when I recover the connectivity (after about 5-10 minutes), it is impossible to send commands to my RGBW bulbs, I see commands on debug log

Sending packet: B0D24E00000200
                              Elapsed: 77
Sending packet: B0D24E00000101
                              Elapsed: 75
Sending packet: B0D24E00000202
                              Elapsed: 80
Sending packet: B0D24E00000203
                              Elapsed: 82
Sending packet: B0D24E00000204
                              Elapsed: 91
Sending packet: B0D24E00000105
                              Elapsed: 82
Sending packet: B0D24E00000106
                              Elapsed: 88

I need to switch between RGBW and an other mode, send a command, and switch back to RGBW.

Sending packet: 00DB8F6566CDBA65A9
                                  Elapsed: 225
Sending packet: B0D24E00000107
                              Elapsed: 71
Sending packet: B0D24E00000208
                              Elapsed: 68

It is very strange.... I havent this issue with nRF24L01.

How can I help you in order to do more troubleshooting ?

sidoh commented 7 years ago

Obviously you shouldn't be losing connectivity at all. This is probably happening because the firmware is stuck in some loop with a yield in it. Said loop could be buried in an internal library (like SPI). Since the ESP has a single CPU core, the code to handle client requests aren't going to get handled, meaning it'll be unreachable.

It'd be helpful to know what part of the code it's stuck in. The most straightforward way to figure this out is to insert prints in various places and see which one is getting barfed last.

I suspect the RGBW issue is unrelated. I can confirm that the radio is much less reliable with RGBW (works perfectly well with RGB+CCT). I think WoodsterDK and I both believe this is because the register settings for non-RGB+CCT are wrong. Probably the best way to solve this is to sniff the register settings from an official remote with a logic analyzer.

WoodsterDK commented 7 years ago

I havent got my computer at the moment, Try these settings for non rgbcct bulbs:

regWrite16(0, 111, 224, 7); // Recommended value by PMmicro regWrite16(1, 86, 129, 7); // Recommended value by PMmicro regWrite16(2, 102, 23, 7); // Recommended value by PMmicro regWrite16(4, 156, 201, 7); // Recommended value by PMmicro regWrite16(5, 102, 55, 7); // Recommended value by PMmicro regWrite16(7, 0, 76, 7); // PL1167's TX/RX Enable and Channel Register regWrite16(8, 108, 144, 7); // Recommended value by PMmicro regWrite16(9, 72, 0, 7); // PL1167's PA Control Register regWrite16(10, 127, 253, 7); // Recommended value by PMmicro regWrite16(11, 0, 8, 7); // PL1167's RSSI OFF Control Register -- ??? regWrite16(12, 0, 0, 7); // Recommended value by PMmicro regWrite16(13, 72, 189, 7); // Recommended value by PMmicro regWrite16(22, 0, 255, 7); // Recommended value by PMmicro
regWrite16(23, 128, 5, 7); // PL1167's VCO Calibration Enable Register regWrite16(24, 0, 103, 7); // Recommended value by PMmicro regWrite16(25, 22, 89, 7); // Recommended value by PMmicro regWrite16(26, 25, 224, 7); // Recommended value by PMmicro regWrite16(27, 19, 0, 7); // Recommended value by PMmicro regWrite16(28, 24, 0, 7); // Recommended value by PMmicro regWrite16(32, 72, 0, 7); // PL1167's Data Configure Register: LEN_PREAMBLE = 010 -> (0xAAAAAA) 3 bytes, LEN_SYNCWORD = 01 -> 32 bits, LEN_TRAILER = 000 -> (0x05) 4 bits, TYPE_PKT_DAT = 00 -> NRZ law data, TYPE_FEC = 00 -> No FEC regWrite16(33, 63, 199, 7); // PL1167's Delay Time Control Register 0 regWrite16(34, 32, 0, 7); // PL1167's Delay Time Control Register 1 regWrite16(35, 3, 0, 7); // PL1167's Power Management and Miscellaneous Register regWrite16(40, 68, 2, 7); // PL1167's FIFO and SYNCWORD Threshold Register regWrite16(41, 176, 0, 7); // PL1167's Miscellaneous Register: CRC_ON = 1 -> ON, SCR_ON = 0 -> OFF, EN_PACK_LEN = 1 -> ON, FW_TERM_TX = 1 -> ON, AUTO_ACK = 0 -> OFF, PKT_LEVEL = 0 -> PKT active high, CRC_INIT_DAT = 0 regWrite16(42, 253, 176, 7); // PL1167's SCAN RSSI Register 0 regWrite16(43, 0, 15, 7); // PL1167's SCAN RSSI Register 1 delay(200); regWrite16(128, 0, 0, 7); regWrite16(129, 255, 255, 7); regWrite16(130, 0, 0, 7); regWrite16(132, 0, 0, 7); regWrite16(133, 255, 255, 7); regWrite16(135, 255, 255, 7); regWrite16(136, 0, 0, 7); regWrite16(137, 255, 255, 7); regWrite16(138, 0, 0, 7); regWrite16(139, 255, 255, 7); regWrite16(140, 0, 0, 7); regWrite16(141, 255, 255, 7); regWrite16(150, 0, 0, 7); regWrite16(151, 255, 255, 7); regWrite16(152, 0, 0, 7); regWrite16(153, 255, 255, 7); regWrite16(154, 0, 0, 7); regWrite16(155, 255, 255, 7); regWrite16(156, 0, 0, 7); regWrite16(160, 0, 0, 7); regWrite16(161, 255, 255, 7); regWrite16(162, 0, 0, 7); regWrite16(163, 255, 255, 7); regWrite16(168, 0, 0, 7); regWrite16(169, 255, 255, 7); regWrite16(170, 0, 0, 7); regWrite16(171, 255, 255, 7); regWrite16(7, 0, 0, 7); // Disable TX/RX and set radio channel to 0

WoodsterDK commented 7 years ago

Regarding being unresponsive, a timeout could be added here: while (digitalRead(_pin_pktflag) == 0) { //do nothing. } There is no yield or sleep here, so the result would be a lockup and then a wdt reset, and perhaps the lt module then prevents the esp for a proper boot depending on which pins are used.

g-rom-info commented 7 years ago

I added serial output between this loop like this:

    //Wait until the packet is sent.
    #ifdef DEBUG_PRINTF
      Serial.println(F("Begin of wait loop... "));
    #endif
    while (digitalRead(_pin_pktflag) == 0)
    {
        //do nothing.
    }
    #ifdef DEBUG_PRINTF
      Serial.println(F("End of wait loop... "));
    #endif

Result:

Sending packet: B0D24E00400E00
                              Begin of wait loop...
End of wait loop...
Begin of wait loop...
End of wait loop...
Begin of wait loop...
End of wait loop...
Begin of wait loop...
End of wait loop...
Begin of wait loop...
End of wait loop...

As a conclusion, the loop is not infinite

WoodsterDK commented 7 years ago

Two issues: 1) lockup of the WIFI 2) No control unless type of bulb is switched around.

But debug wise the output only makes sense, if the error occurred. Let's start dealing with number 2 - can you replicate a scenario that keeps failing, or is it periodic and without any pattern?

WoodsterDK commented 7 years ago

Just noticed the following: 'I tried to force RGB_CCT initialization on the constructor, no effect.. I tried to increase the delay on "bool LT8900MiLightRadio::sendPacket(uint8_t *data, size_t packetSize, byte byChannel)" function, no effect...'

Inserting a delay here won't make any big difference I think.... it makes the timing between the transmits longer, but not the timing two different commands.

sidoh commented 7 years ago

When I was testing LT8900 with RGBW bulbs, it seemed like it was just unreliable in general. I'm not sure there was much of a pattern to it. Seemed like RGBW bulbs were just overall less responsive.

sidoh commented 7 years ago

Also, @Lstt2005 mentioned in #70 that his LT8910 worked very well in an older revision (looked like 38f8fce169edb0e1f4923ccf6d7b24b30a42e910). From what I can tell there weren't any relevant changes between master and that revision, but might be worth trying.

WoodsterDK commented 7 years ago

Is there a list of commands that causes the problem.... multiple commands?

sidoh commented 7 years ago

I just remembered this, and I'd want to test again to confirm, but it actually seemed like it was mostly turning the bulbs on. Other commands sent after the bulbs were on seemed reasonably reliable.

WoodsterDK commented 7 years ago

Could it be that the bulbs are in a low power sleep mode, when not turned on, and the current implementation sends a complete frame with retransmits too fast? Try to set DEFAULT_TIME_BETWEEN_RETRANSMISSIONS_uS to 350 as a test.

sidoh commented 7 years ago

Yeah, this was basically what I was guessing. Thanks for pointing out the constant, I'll give that a try.

sidoh commented 7 years ago

I tried RGBW bulbs with the LT8900 again last night, and had trouble reproducing. Turning on and off seemed to work just fine. :\

sidoh commented 7 years ago

Okay, after a bit more testing, I've narrowed things down a bit:

  1. With RGB+CCT bulbs, the radio doesn't seem to work right after a fresh boot. Sending 1-3 packets seems to "warm it up," and it'll afterwards work very reliably. Resetting the radio also appears to trigger this state.
  2. I'm not sure why I wasn't seeing this earlier, but @g-rom-info is 100% right -- RGBW bulbs just do not work at all unless you've first sent a RGB+CCT packet. After RGB+CCT bulbs have started responding to commands, RGBW bulbs will then respond just as frequently.

Seems like this indicates the problem is with the register settings. I'll poke around and see if I can't find the differences.

WoodsterDK commented 7 years ago

Great that you found a pattern.

WoodsterDK commented 7 years ago

If you don't change bulb type the init values are not sent. So it sounds strange if it works after a couple of rgbcct frames. Regarding the rgbw ones, try change vInitRadioModule so that it both sends the rgbcct values first and then the rgbw ones in one pass.

sidoh commented 7 years ago

Haha, I think I figured it out:

https://github.com/sidoh/esp8266_milight_hub/commit/e5b47c25f4f080cb57a358283d5fbe27e45209a1#diff-4ace3a304fd082eb555588a3e66f70f1L19

Basically the RGBW radio was set to the default radio before other radios were initialized. MiLightClient then thought RGBW had already been initialized, but its settings were overwritten with other radios' settings, so they didn't stick until one switched away and then back.

sidoh commented 7 years ago

Fixed in 1.2.2.

sidoh commented 7 years ago

@g-rom-info can you verify that it's fixed and close if it looks good?

g-rom-info commented 7 years ago

Yes it's OK the RGBW works correctly after a cold start :)

However, I still have instabilities on IP connectivity. After a cold start, the ping is high or the ESP no longer responds intermittently. The web page may not be displayed and receipt of UDP commands not processed.

Réponse de 192.168.18.12 : octets=32 temps=710 ms TTL=127
Délai d’attente de la demande dépassé.
Délai d’attente de la demande dépassé.
Délai d’attente de la demande dépassé.
Réponse de 192.168.18.12 : octets=32 temps=398 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=102 ms TTL=127
Délai d’attente de la demande dépassé.
Délai d’attente de la demande dépassé.
Délai d’attente de la demande dépassé.
Réponse de 192.168.18.12 : octets=32 temps=110 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=151 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=8 ms TTL=127
Délai d’attente de la demande dépassé.
Réponse de 192.168.18.12 : octets=32 temps=50 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=146 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=9 ms TTL=127
Délai d’attente de la demande dépassé.
Réponse de 192.168.18.12 : octets=32 temps=91 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=81 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=36 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=3 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=52 ms TTL=127

After sending the first radio packets, the ping appears to stabilize.

Réponse de 192.168.18.12 : octets=32 temps=9 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=9 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=6 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=8 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=8 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=8 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=6 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=1 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=7 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=1 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=3 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=2 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=1 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=8 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=2 ms TTL=127
Réponse de 192.168.18.12 : octets=32 temps=5 ms TTL=127

I bought a second ESP connected simultaneously to the same WiFi network but connected to my nRF24L01 and I have no problem.

Réponse de 192.168.18.13 : octets=32 temps<1ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps=3 ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps<1ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps=4 ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps=1 ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps<1ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps=1 ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps=1 ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps=13 ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps=6 ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps=8 ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps=2 ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps=1 ms TTL=127
Réponse de 192.168.18.13 : octets=32 temps=6 ms TTL=127

After a warm start (by pressing the reset button), the ESP seems to keeps his stability.

I suspect an initialization problem however I'm sorry but I havent time this weekend to investigate more (I go away with my girlfriend :-) ) so I will do more tests next monday evening.

sidoh commented 7 years ago

Interesting that there aren't problems with the other ESP. Maybe one of the traces on your first one is bad?

Have a good weekend! :)

sidoh commented 7 years ago

@g-rom-info can you confirm that this is fixed for you?

g-rom-info commented 7 years ago

@sidoh I need to make further tests. Can you wait until next sunday ?

sidoh commented 7 years ago

Sure thing. No rush :)

g-rom-info commented 7 years ago

(Out of topic - in case of you have not been notified : #65 issue still here with 1.2.3 release :-( )

sidoh commented 7 years ago

@g-rom-info were you able to verify that this is fixed?

g-rom-info commented 7 years ago

@sidoh Sorry, I was on holidays last week and had hard work last 3 weeks, as a consequence I dit not have sufficient time for my tests :( I'm available this afternoon I will make my tests and send you a feedback this evening ;-)

g-rom-info commented 7 years ago

@sidoh I compiled the last main branch from github and injected it to my ESP8266 with PL1167. I also injected the file system in order to have a fresh install. Finaly I exported and imported my production environment configuration from my ESP8266/nRF24, coupled with domoticz. First of all, the configuration is stable : no ping loss, latency seems to be OK. After some commands, it seems to be OK with my RGBW and RGBCCT bulbs. I switched my domoticz configuration in order to use this bridge. Next wednesday, I will give you my feedback after this live experimentation. ;-)

sidoh commented 7 years ago

Sounds good! Thanks :)

g-rom-info commented 7 years ago

My bridge with PL1167 works perfectly since 4 days with 1.3.0 I've proceeded to multiple cold starts : the ESP is stable after a few seconds. The ping is quite stable but less stable comparing with nRF24l01. However all UDP/HTTP commands are processed correctly. I suspect more CPU activity with PL1167/LT8900.

I don't understand why last month I have multiple connectivity loss with 1.2.2, however I think we can consider that the issue is solved.

sidoh commented 7 years ago

Thanks for testing! Glad to hear it's working better. Feel free to re-open if you notice anything else.