psieg / Lightpack

Lightpack and Prismatik open repository
GNU General Public License v3.0
1.57k stars 188 forks source link

Adalight: Wait for acknowledge before next frame #117

Closed dmadison closed 4 years ago

dmadison commented 7 years ago

This is a request for a wishlist feature.

On my Adalight setup using WS2812B LEDs, interrupts need to be disabled on the microcontroller in order to send the finished color data to the LEDs. At lower "Grab Rate" settings in Prismatik, this causes frame drops as the header information for the successive frame is dropped while the LEDs are latching. The system then wastes serial bandwidth on color information that it never uses.

Setting the grab rate to a slower interval mostly solves this problem during normal use, as it provides enough time for the LEDs to latch without receiving serial new data. However it's still an issue if you attempt to turn off the LEDs while the framerate is high. On my setup I need to wait with my mouse over the Prismatik icon for ~1 second for the framerate to subside before I try turning off the lights, otherwise the "off" frame is dropped.

A solution would be for Prismatik to wait for an acknowledgement from the controller (either a character or a word) before sending the next frame. This would be an option (or even an 'experimental') setting, as to not slow down setups that do not have an interrupt issue.

A rougher solution would be for Prismatik to wait a given period after the last sent frame (~10 ms) before sending the "off" frame, although that's a band-aid fix.

psieg commented 7 years ago

I don't have an Adalight device so this is hard for me. Apparently there is QSerialPort::FlowControl, so if this just needs to be set to hardware or something, let me know

dmadison commented 7 years ago

That's interesting. Software-based flow control looks to be easy to set up with Adalight, so long as you could add an option to toggle flow control on Prismatik.

I'm not sure hardware-based flow control is possible, although for this purpose it shouldn't be necessary.

Benik3 commented 6 years ago

In Arduino you can "flush" serial buffer after the color change, so next frame will be always received OK :) Just add to the end of the program this: while (Serial.available()) Serial.read(); Or you can use APA102 (SK9822) LEDs over SPI which don't need the disabling of interrupts...

dmadison commented 6 years ago

Flushing the buffer doesn't fix the issue, though. The problem is missing frames and wasting serial throughput, not that the data is corrupted.

You could use a different LED type sure, but they're more expensive and there's an easier software fix.

Benik3 commented 6 years ago

Is it really worth? Of course you are missing frames but you must, when you have LEDs which are not able to work with it. Software flow control will control the frames, sure, but on the other hand, you will get almost the same like when you are flushing buffer. Yeah the routine for flushing take some time, but it's still faster and for eye better then letting process the corrupt data and start again. Also refresh rate of WS2812 is not so high like APA102/SK9822 (with 100 WS2812 you can get max around 60FPS).

I choose SK9822 first for these reasons and second because they have higher PWM for colour control ;)

dmadison commented 6 years ago

I'm not sure that you understand the problem... I'm already flushing the serial buffer. That doesn't change the fact that frames are lost when interrupts are disabled. This is mostly an issue when turning off the lights (if the last frame is missed), but it's always wasting throughput on the bus.

The WS2812B LEDs work fine. Something like APA102 would probably work better, sure, but flow control would be helpful for any LED type. I'm also not sure where you pulled that framerate number from... I've used longer strings of WS2812B LEDs at much higher than 60 FPS. From my testing the current framerate limitation is actually the grab rate in Prismatik, not the LED data speed.

Benik3 commented 6 years ago

I made mistake in calculation. It's 1000LEDs on 30FPS :) Yeah you are right that it's wasted throughput. I calculated time for 100LEDs and 115200Bd. I got 52ms which is a lot. BTW can't you use HW flow control? CH340g (if you have Arduino nano and similar) can do it. But I don't know if enabling HW flow control in device manager for USB will work or it will be overwritten by Prismatik serial config... :/

psieg commented 6 years ago

@dmadison do you know these two options? What would one need to do to get flow control working properly with Adalight?

Hardware flow control (RTS/CTS). Software flow control (XON/XOFF).

dmadison commented 6 years ago

To be frank I don't know enough about how the USB virtual serial wrapper works to know how to implement hardware flow control if it's even possible. My hunch is that the varying USB to serial adapters on the different Arduinos might make implementing that difficult.

Software flow control should be easy, it's just a matter of modifying the firmware to support it. Although I haven't had the chance to build Prismatik from source to test, yet.

There's a third option, which is to wait for an XON byte after every frame. Which may be the solution if the standard software flow control isn't responsive enough.

dmadison commented 5 years ago

After doing some testing with this, it looks like Qt's software flow control just relegates the responsibility to the driver. Meaning that if the serial driver doesn't support software flow control then you're SOL. And it looks like the CH340G driver doesn't support it.

Trying to implement flow control in Prismatik itself seems to be a lost cause. For one, the software can no longer buffer an entire frame because it needs to check whether an "XOFF" byte was received after each sent byte. Iterating like this leads to small breaks in the data stream, presumably because of the scheduler. Some of these are small (~150 us), some of them not so much (~7.1 ms).

I'm also running into an issue where the read bytes are slow to arrive. Prismatik reads the "stop" byte ~2 ms after it was sent, which is enough time to trigger the next grab call and send some data to the port during the stop condition:

15:03:03:547 Debug: Flow set to true
15:03:03:573 Debug: Wrote 246 bytes
15:03:03:575 Debug: Flow set to false
15:03:03:575 Debug: Wrote 3 bytes and quit

This predictably plays havoc with the data stream and defeats the purpose of having flow control in the first place.

Software flow control would be ideal because it's backwards-compatible with all other Adalight firmware, but I don't think it's going to be reliable without a supported driver. I'll look into acknowledgement bytes next.

dmadison commented 5 years ago

It looks as though processing acknowledgements-only (byte after every completed frame) is much more promising, but there's still a significant delay (~5-20 ms) between when a byte is received and when the next frame starts sending.

On the plus side, this gets rid of problems with frames being truncated or silently dropped by the device due to interrupts being disabled when the LEDs are latching.

On the down side, the framerate doesn't seem to be significantly improved (although the serial data looks significantly cleaner). In fact at a high enough baud rate, it looks like with the delay between when a byte is sent and when it is read it's possible to send an entire extra frame. This would in effect lower your overall framerate, even though it prevents it from reaching a point where the data corrupts.

The way I'm currently doing this doesn't fix the dropped frames issue, either. The function can either silently drop a frame (if there's no ack byte ready) or report a device error. I think the function would either have to wait indefinitely until it's received a byte or if the timer triggers again for a new frame. Unfortunately dealing with the threads/timers is above my knowledge (this is my first time messing with Qt).

Benik3 commented 5 years ago

What to move from Arduino to something better where we don't have to turn off interrupts? There exist DMA or I2S libraries e.g. for ESP8266. Isn't it better way?

dmadison commented 5 years ago

That doesn't solve the problem for many other people who are using the software with a typical Arduino.

Benik3 commented 5 years ago

But that's the problem. They are using something what isn't ideal for it.
If you fix HW, you fix the SOURCE of the problem.
If you fix SW, you just fix CONSEQUENCE of the problem.

I just tested ESP8266 with NeoPixelBus library and Uart method. It's working pretty fine with WS2813 (similar to WS2812).
Still they are pretty slow. I had to set BaudRate to 921600 to get some good FPS.
Here I took video of the tested WS2813: https://photos.app.goo.gl/8nLjiyaQ5DiH2oaT7
And here is my actual setup with APA102 LED: https://photos.app.goo.gl/s2D8JKwwX8BtAGxa9
Both has 110LEDs, grab interval I have 16ms. ESP8266 can be easily bought these days and for similar price like Arduino (e.g. Wemos D1 mini).

dmadison commented 5 years ago

It's all not "ideal" for it. So long as all of the DIY projects / protocols keep using USB CDC serial they're inherently handicapped. Until you move to DMA with high speed USB and a custom driver and then implement some form of handshake protocol between Prismatik and the board it's all a band-aid on the "real" problem. But assuming that we're not replacing the hardware for everyone in the world who has a DIY setup, can we be satisfied with a no cost software fix?

To wit - changing the hardware doesn't even fix the problem, it just masks it. The problem is the lack of flow control on the bus. If you up the clock rate of the board and the LEDs you can often get ahead of it, but you have to fight your baud rate / LED count / output speed to all behave within that limit. The far better solution, simply put, is to not send data when the board isn't ready for it.

Didn't we have this conversation last year? I feel like I'm repeating myself.

Benik3 commented 5 years ago

The fix is not no cost, it cost time, which can be much more expensive then HW :)
We could use other cheap HW like Blue Pill (STM32 with native USB) and use the method which you mentioned (direct USB combined with DMA transfer to LEDs). But someone must write the code for it... Price of the HW is in that point very very low.
Well all have DIY setup with Arduino because someone started with it and they know this brand. ESP8266 can be programmed by Arduino IDE like AVR. The BluePill is also supported. If you get it work and it will work better, people will move to this solution.
Yes, it maybe masks the problem. But I must say, that for the year what I use Ambilight I never noticed anything bad.
To the handshake - on the other hand we can ask, are the data still worth when we wait for the handshake? Aren't they too old? So there should be also option to change how old the data frame can be to be "actual and usable". So you can have delayed frame or dropped frame. When you have delayed frame the delay can even grow up when there will be lot of new frames. Until the FPS drop down so the buffer will empty.

BTW with ESP we can also have wireless data send e.g. with websocket which is pretty fast, but again someone must do the "no cost" change to the SW...

If anyone want to try the ESP, here is code which I used. It contain also gamma 2.2 fix: https://github.com/Benik3/Adalight_ESP

dmadison commented 5 years ago

No cost meaning no cost to the end user. If someone modifies the software it doesn't cost the end user anything to enable the feature. Modifying the hardware requires the end user to shell out time and money.

Not sure what you're on about with "waiting for the handshake". The point of a handshake is to make sure the data is received properly, or in other words to avoid issues with delayed frames and dropped frames. Which are already present and the entire reason for this issue.

Yes, in an ideal world everyone would ditch their existing setups and move to a better hardware solution. But we don't live in that world and repeating that is not productive. If you want to buy everyone an ESP and APA102 LEDs be my guest, but I don't have a few hundred thousand dollars to put towards that cause. In the meantime, people are using Arduino Unos and Nanos and running into this problem. Not to mention that, again, swapping to that hardware doesn't solve the actual problem.

We're going in circles. A software solution is possible and that's what this issue is about. Please don't clutter the discussion by talking about unrelated hardware changes.

Benik3 commented 5 years ago

Well it solves the problem. On the ESP I measured that to receive data and call .Show() takes 7ms with 921600Bd and 110LEDs. That's enough for 140FPS... So with 16ms grabbing (60FPS) I should be able to receive every frame without any problem. I tried to turn ON and OFF the Ambilight to try to replicate your problem with dropped frame (so the LEDs are still lighting even the Prismatik is OFF) and I wasn't able to reproduce the problem no matter how hard I tried...

I don't say that everybody must change to APA102, but ESP cost 3$. Even coffee at Starbucks cost more... and it's just micro USB + two wires into LEDs...

I just wish you good luck to have the Arduino without any dropped or delayed frames and WS2812. It's more then year what is this issue opened and meanwhile you could have it solved :)

P.S. To fully test my idea I made counter of bad data in the ESP. In 3minuts of 60FPS video there wasn't single error.

dmadison commented 5 years ago

I don't say that everybody must change to APA102, but ESP cost 3$. Even coffee at Starbucks cost more...

That is your solution though. You haven't offered any constructive information other than "everyone should change to X hardware". Which, as I've said repeatedly, doesn't address the problem.

You're also making some bad assumptions about how the system works. If the entire process for receiving data and sending it to the LEDs takes 7 ms, that does not mean you won't have any issues at a 16 ms update rate. 16 ms is the max interval between frames not the minimum, meaning it's possible for Prismatik to send two frames immediately after each other - even at a 1000 ms update rate.

If a single byte is lost the system breaks. If it's lost during the header you lose the entire frame, if it's lost during color transfer you get channel mismatches. To that extent a higher baud rate is actually more prone to having issues if the timing lines up, because the per-byte time is significantly lower. And the problem and what this issue is about, simply put, is that currently there is nothing in the system to prevent this from happening or to know that this has happened.

The current way this is handled is a shotgun approach: throw out as many frames as possible and see what sticks. This works okay during "normal" operation because any lost frame is quickly replaced with a newer one. But there are a few instances - notably during shutoff and during the color wizard (see #123) where Prismatik sends a single key update frame which the device very well may not receive.

I just wish you good luck to have the Arduino without any dropped or delayed frames and WS2812. It's more then year what is this issue opened and meanwhile you could have it solved :)

You also seem to be under this strange impression that I created this issue to try and "fix" my own setup? This has nothing to do with my own setup, and everything to do with a problem in how the software deals with these types of devices. Changing to faster hardware doesn't solve the issue, it just tries to outrun it. You don't solve a race condition by winning the race.

Benik3 commented 5 years ago

You haven't offered any constructive information other than "everyone should change to X hardware".

Really? As I mentioned the library works without disabling interrupts (so the displaying works asynchronous with receiving new data). I even tested it very successfully and I provided informations above...

You're also making some bad assumptions about how the system works. If the entire process for receiving data and sending it to the LEDs takes 7 ms, that does not mean you won't have any issues at a 16 ms update rate. 16 ms is the max interval between frames not the minimum, meaning it's possible for Prismatik to send two frames immediately after each other - even at a 1000 ms update rate.

It's grab interval, and it's minimum time. As you can see from the FPS counter, because it doesn't always grab at this time but even slower. BTW I had yesterday serial monitor hooked on serial line from Prismatik with 1000ms grab interval and the frames went nicely each 1s...

If a single byte is lost the system breaks. If it's lost during the header you lose the entire frame, if it's lost during color transfer you get channel mismatches. To that extent a higher baud rate is actually more prone to having issues if the timing lines up, because the per-byte time is significantly lower. And the problem and what this issue is about, simply put, is that currently there is nothing in the system to prevent this from happening or to know that this has happened.

Now you mix 2 things. This is nothing to do even with handshake. If the bytes corrupts during sending, yes it will be corrupted. Only solution to this would be to make some real CRC over whole data. But It will take some time because for 100LEDs you have 303bytes of data which you must calculate. If the CRC doesn't match, you must wait for new data anyway. Or do you want to even add some parity to be able to self repair the data? Yes, faster Baud rate can be more problematic (also the error depends on clock of the MCU, because Bd is divided form it. Even slow baud rate can have pretty high error percentage). The system will maybe break, but are you really so able to see one or two corrupted frame in 60FPS? Because I probably not even that I have pretty sensitive seeing mainly on blinks. And I use the PC every day for hours, I have the ambient light running even while gaming. Also if the bytes are not corrupted in header, it would be probably only few pixels, so you can see it even worse. Another solution for this would be to move from serial line to e.g. TCP where you have automatic CRC, receive checks etc. Again it means new HW and even new SW.

The current way this is handled is a shotgun approach: throw out as many frames as possible and see what sticks. This works okay during "normal" operation because any lost frame is quickly replaced with a newer one. But there are a few instances - notably during shutoff and during the color wizard (see #123) where Prismatik sends a single key update frame which the device very well may not receive.

I tested it and I wrote it earlier, no meter how hard I tried I wasn't able to reproduce your problem with loosing the frame while turning ON and OFF the ambilight. And I tested it under full 60FPS grab (I had 60FPS video running). Now I tested even the white colour balance. More the 30 opens of the white balance screen and every time the LEDs showed up correctly.

You also seem to be under this strange impression that I created this issue to try and "fix" my own setup? This has nothing to do with my own setup, and everything to do with a problem in how the software deals with these types of devices. Changing to faster hardware doesn't solve the issue, it just tries to outrun it. You don't solve a race condition by winning the race.

And you can't win race condition with low grade equipment. It's hand in hand :) As you say, problem is how the software must deal with HW, which is not ideal for it. I provided cheap solution which makes the problem very rare and even if it occur it's probably invisible. Still you can buy some "real" solution for thousand dollars...

dmadison commented 5 years ago

Really? As I mentioned the library works without disabling interrupts (so the displaying works asynchronous with receiving new data). I even tested it very successfully and I provided informations above...

Which, again, is predicated on switching to different hardware. And is not a solution to the specific software bug this issue is attempting to address.

It's grab interval, and it's minimum time.

No, it's not. Prismatik has no regard for the serial bus state or the baud rate and just stuffs bytes into the output buffer. If the baud rate is low or the driver is unready frames will be sent immediately after each other. The grab interval is the setting for the timer, not the output.

The grab interval also has no bearing on other output modes, such as the calibration wizard or the user-initiated "off" signal, both of which can and do send frames immediately after each-other.

Now you mix 2 things. This is nothing to do even with handshake.

I'm not mixing up anything, you're not understanding what I'm writing. I'm not talking about corrupted data on the bus from noise, I'm talking about losing data during the write cycle of the LED latch due to a lack of flow control on the bus. Which, for the nth time, is what this issue is about.

The system will maybe break, but are you really so able to see one or two corrupted frame in 60FPS?

Yes, unequivocally. Try it yourself. Set up a program that writes 59 white frames and 1 red frame.

I provided cheap solution which makes the problem very rare and even if it occur it's probably invisible. Still you can buy some "real" solution for thousand dollars...

No, you're derailing this issue by talking about unrelated hardware changes. I'm not trying to figure out how to purchase better hardware, I'm trying to figure out how to fix a specific software bug that exists regardless of what hardware you're using. If you can't understand that there's no point in me responding to you further.

Benik3 commented 5 years ago

By the way do you know, that original Adalight is built with APA102 like LEDs driven by SPI? Not one wire WS2812? Also changing the Prismatik will mean to make new device in it, because otherwise we will break the original one.

Previously you didn't write because of LED latch, you just wrote if you get bytes corrupted and that it can be even worse on higher Bitrates.

I will have to try the one corrupted frame. BTW the corruption will show up only when the data corrupts while reading the data to LED buffer (the for cycle). Otherwise it will be detected by header check which simply skip the rest of the data, so there will be no flash, just a missed frame.

...specific software bug that exists regardless of what hardware you're using...

And that's no true. With good HW you don't have this.

BTW do you also know, that UART buffer or Arduino is only 64Bytes? On ESP it's 128 bytes but it's not still enough to received full frame with many LEDs (like in my case with 100LEDs I have 306 bytes in one frame). So you must be sure to receive the data enough fast even with the handshake method. And if you can do it even while rendering LEDs, the problem with fast frames doesn't exist anyway. This problem can be also solved by making second bigger buffer and receive data from HW UART buffer with interrupt.

But I was curious have bad this problem can be. I made a file with 3frames and sent this file repeatedly with RealTerm without any delay. I also added CRC calculation over the received data for LEDs. Now I'm over 50000 frames without any error on 512000Bd. Only problem can be with too fast BaudRate. When I tested it with 921600Bd the problem was able to show up, because .show method for updating LEDs takes 3ms and in this time the UART buffer overflowed. Yes this can be solved by handshake, but so high bitrate is pretty overkill (on beginning there was even max 115200 Bd). And with 3ms it still max baudrate around 816000 before it starts to overflow the 128byte HW UART FIFO.

dmadison commented 5 years ago

There's so much wrong in your posts I don't know where to start. And quite frankly I'm done wasting my time trying to re-explain things that have already been explained to someone who is unwilling to read no matter how many times I can try to re-word them for you to understand.

And that's no true. With good HW you don't have this.

Yes, you do. The fact that you keep repeating that in every post shows that you don't actually understand the problem. The problem is with the protocol. Bad hardware exacerbates it, good hardware can hide it, but the problem still exists.

At this point I think you're being purposefully obtuse. I'm going to block you on GitHub until you can hopefully learn to contribute in good faith. Good luck.

dmadison commented 4 years ago

I'm cleaning up some old GitHub issues I have open and declaring defeat on this.

Because the serial data is read asynchronously by Qt, there's no way to use software flow control to meaningfully increase framerates or prevent sending data when interrupts are disabled on the device.

This is ultimately a breakdown in the USB to serial interface. Arduinos using a CH340G or 16U2 with the default firmware do not support either hardware or software-based flow control without additional modifications - either rewiring the microcontroller pins or replacing the USB adapter firmware. Therefore there's no way to 'tell' the USB to serial interface that the main microcontroller cannot receive data, and it's then lost while interrupts are disabled during the LED latch period.

The only software solution I can see would be a custom driver that implements software based flow control using one of the serial-based ambilight protocols. Of course, developing and loading a custom unsigned serial driver is a completely overkill solution for this problem.

In the meantime, this is just going to be a bug of the serial-based ambilight protocol due to the way the microcontroller hardware is configured.

Closing this for posterity.

WillsB3 commented 4 years ago

Thanks for your efforts @dmadison 👍