Ideas for a faster/lower latency system

dzervas commented 2 years ago

I thought I was smart when I started my own project 3 days ago that grabbed the JPEG image and dumped it through 80211_tx function. Little did I know that I was re-inventing the wheel...

Anyway, I read some of your code and it's actually smart, how you've implemented error correction, why you do so, etc. My last trick (as in idea, haven't done that yet) was to:

Tell the camera to spit out raw pixels (much faster than JPEG I think)
Maaaaaaybe apply some kind of compression on each packet?
Encode each packet with FEC (is this feasible or you need the whole video frame?)
Reduce the DMA buffer to 2-3x the max 80211 frame size (1500-32 but taking into account compression/FEC/other metadata)
Access the DMA directly with "just enough" data to send a packet frame (essentially override the dma ISR esp-camera sets up)

There's also the ability to disable the sanity checks that 80211_tx does by overwriting ieee80211_raw_frame_sanity_check to always return ESP_OK. Check out some of the esp32 deauther projects

What do you think? Have you thought of these or even tried them?

dzervas commented 2 years ago

Oh I also wanted to ask what's up with I2S and DMA. The camera is advertised as an I2S device but I see it is used with DMA. What's up with that? What's the connection between the two interfaces? Maybe I2S can be used instead of DMA to achieve streaming of frame data?

dzervas commented 2 years ago

Ok the wifi latency (with full 1500 byte payload) is around 2.5ms. I did a benchmark by sending the gettimeofday and analyzing the values with python (it even has a printf on every packet so it should be even smaller). Definitely not the problem. Off to do some camera benchmarks

dzervas commented 2 years ago

I was also thinking if there's an upside on using a CMOS camera to be able to apply FEC - would turn the ESP into a vTX. How much better of a picture would we get?

jeanlemotan commented 2 years ago

Unfortunately I had to pause the project due to some personal reasons. But feel free to take this and improve it in whatever way you see fit. A few comments:

Tell the camera to spit out raw pixels (much faster than JPEG I think)

The problem with this is that you'll quickly hit the I2S transfer bandwidth limit. At 320x240 resolution, 2 bytes per pixel, 30 FPS, you're looking at 4.6MBytes per second, or 36Mbits/s, way above what you can do with I2S on this camera. IIRC the limit was around 20Mbits/s. At 640x480 you need 4x the bandwidth and so on, so not practical at all. There's the RAM limit as well - uncompressed buffers are significantly bigger than JPEG one.

Maaaaaaybe apply some kind of compression on each packet?

JPEG is that compression, with the advantage of being done by the camera itself, so no CPU spent on it.

Access the DMA directly with "just enough" data to send a packet frame (essentially override the dma ISR esp-camera sets up)

This is already done in the code. The data is processed (FEC encoded) as it comes from the DMA buffers, without waiting for a full frame transfer.

Oh I also wanted to ask what's up with I2S and DMA. The camera is advertised as an I2S device but I see it is used with DMA. What's up with that? What's the connection between the two interfaces? Maybe I2S can be used instead of DMA to achieve streaming of frame data?

So the camera talks with the esp32 using a parallel, 8 bit interface. The ESP uses its I2S peripheral in a special configuration to read all 8 bits of data for each clock using DMA transfers. So far it's the most optimal way to get data in the ESP - both from a bandwidth point of view (due to the parallel I2S trick) and also from a CPU usage point of view (due to DMA).

dzervas commented 2 years ago

The ESP camera component was modified to send the data as it's received from the DMA instead of frame-by-frame basis. This decreases latency quite significantly (10-20 ms) and reduces the need to allocate full frames in PSRAM.

2nd line in the readme. Lol I'm blind...

So there's nothing left to overcome the latency? From what I gather, the bottleneck is the JPEG encoder inside the camera, is it not?

If there's nothing more we can do with this exact hardware, maybe we can use a chip that compresses (maybe even FEC) the raw (or whatever) camera data and DMAs it to the ESP? (although it sounds like an EXTREMELY specific chip... I think that the only way to do this is an FPGA and in that case we can go with much more standard MIPI-CSI cameras (DJI cameras))

jeanlemotan commented 2 years ago

Hmm, I'm not sure it's in the camera. My guess is that the jpeg compressor doesn't wait for a full frame, since that would require a full frame worth of RAM in the camera and that's expensive. It's probable that the compressor starts as soon as data becomes available - so as soon as 8 lines worth of data are read from the sensor. In my tests, the PI added a lot fo the latency. I don't remember the specifics but the latency was fluctuating quite a lot on the pi, even without the camera in the equation - just sending data from the esp and measuring how long it takes to display it.

dzervas commented 2 years ago

Sidenote: According to your rates, you're way past the specs described in OV2640 datasheet. You're not only capturing, but also encoding, sending, receiving, decoding and showing images at 45+ FPS, while the manufacturer states that it just captures SVGA images at 30 FPS. Impressive.

I wanna first "refactor" the code, split it in files and use ESP-IDF. I'd really like to move it to C, as the whole ESP-IDF system is around C (and I really dislike Arduino). I think that a C codebase will be faster (not by much though, damn your C++ is impressive).

About the Pi (that's actually something I'm comfortable C): I'll definitely move to Rust for transceiver part and have a separate thread running MPV for rendering using VPAAPI (hardware accelerated and doesn't need X11 and friends). Also the resulting program will run as PID 1 - we don't need ANY other process. Last, I'm gonna try (and maybe optimize for) an RT kernel.

Damn this project is a HUGE undertaking, it started as a "weekend project".

lida2003 commented 1 year ago

Any idea to improve the low latency when using higher resolution #33

I have the same question here.

From the above discusions, I found that esp32 camera for PFV is limited if no other HARDWARE included.

@dzervas @jeanlemotan In other words, we have to find standard MIPI-CSI cameras (DJI cameras) solutions, right?

alex5250 commented 3 months ago

Not sure what project status,but new esp32-p4 are looking promising. https://www.youtube.com/watch?v=5sKs1jMDLFM 0.4Ghz CPU with hardware video encoding support and 32Mb PSRAM and yes yes seems dji is about to release something new :)

jeanlemotan / esp32-cam-fpv

Ideas for a faster/lower latency system #8