indrekluuk / LiveOV7670

A step-by-step guide to building the circuit for this project:
https://circuitjournal.com/arduino-ov7670-10fps
237 stars 91 forks source link

Improve framerate? #5

Open larsenglund opened 6 years ago

larsenglund commented 6 years ago

I got a thought for improving framerate: since you are blindly reading pixels, could you not double the framerate for grayscale video by doubling the pixel clock, changing to YUV and only reading every other byte (the luma (Y)) from the YUYVYUYVYUYV.. byte stream?

indrekluuk commented 6 years ago

There is one problem. Pixel clock is already maxed out. Pixel clock depends on XCLK and clock pre-scaler parameter. Pre-scaler value is already 0 for 10fps.

The only way to do that would be to increase XCLK. Currently XCLK is generated by digital output pin. Since Arduino runs at 16Mhz max for pin output is 8Mhz. Maybe if there is a way to get Arduino 16Mhz clock output directly then it might be possible.

larsenglund commented 6 years ago

But the XCLK is first multiplied with the PLL Multiplier (REG_DBLV) before getting divided by the prescaler (2x(REG_CLKRC+1)), so setting REG_DBLV to 4 and REG_CLKRC to 1 should double the pixel clock. It's described on page 13 in the OV7670 implementation guide appnote.

indrekluuk commented 6 years ago

I added PLL multiplier setup and was able to double pixel clock speed.

There are currently two problems.

  1. There isn't enough time to send data to the screen. It might be possible if the screen had parallel pixel loading instead of SPI. But it would not help with Arduino Uno/Nano since it doesn't have enough PINs. Maybe with Arduino Mega. Yellow is pixel clock from the camera and cyan is SPI clock to the screen at 10fps: 10hz

At 20fps every other line is missed since it is still sending data to the screen: 20hz

  1. I tried sending only 1/3 of a line to the screen but the image was very flickery. I think the issue is that I fail to detect the first pixel reliably. Arduino runs at 16Mhz and pixel clock is at 4Mhz. This means that there are only two instructions executed during the time that pixel clock is low. I think one cycle of this loop probably is more (waiting for pixel clock to go low): while(OV7670_PIXEL_CLOCK); This means that detecting first pixel is very hit-and-miss.

It probably is possible to create some clever external logic that stops XCLK if pixel clock goes low then there is time for Arduino to react. But then it would actually be more sensible to use something faster than Arduino

larsenglund commented 6 years ago

Perhaps PCLK could be moved to GPIO1 and HREF connected to GPIO2 and start reading pixels when GPIO2 interrupts? Or if all the pixels in a row are read "blindly" then PCLK is not needed at all, only HREF..

As for TFT-speed, your SPI code (sendPixelByte) seems to max out performance (did you find the number of NOP:s experimentally?). Perhaps interlacing can be used, sending every other line switching between odd and even lines.

indrekluuk commented 6 years ago

Using HREF instead of PCLK is good idea. I will try this tomorrow.

My impression has been that interrupts are too slow for this. If I remember correctly then when using Arduino's "attachInterrupt" it takes sixty-something cpu cycles to get to the interrupt code. Maybe if this interrupt is configured directly (without Arduino framework) it will be fast.

Yes, I removed as much NOPs as I could. Beginning of each line can be optimized (the more sparse cyan lines on the scope image). Currently it is done by the Adafruit display library methods. This could be compressed as tight as the data section.

Interlacing is also good idea. First priority is to get line data reading fixed and then interlacing shouldn't be that hard to do.

larsenglund commented 6 years ago

Let me know how it works out!

According to the ATmega328P datasheet it should be able to react in 4 cpu clock cycles. (see 11.7.1. Interrupt Response Time). Looking at attachInterrupt in WInterrupts.c from the Arduino core it doesn't look like it adds any significant (or any at all?) overhead.

indrekluuk commented 6 years ago

I was reasonably successful!

Connected HREF to PIN 12 instead of PCLK. Sometimes it goes a little out of sync after start up but usually after resetting Arduino it is OK. I should try to do it with interrupt at some point. If the execution is really only 4 clock cycles (or not much more) then it would guarantee synchronization.

Interlace lines are very obvious. But it depends what the end goal is, it might not matter. For example last year I created a line following robot and for that double refresh rate would have helped a lot.

Example: IMAGE ALT TEXT HERE

larsenglund commented 6 years ago

Cool! Great work! I assume you are doing interlacing in the video?

Higher framerate also means reduced artifacts from the rolling shutter (since the time between reading out the first and the last line from the cmos i shorter) so even if you don't want/need 20fps you could do it just to get less motion distored frames. For a line follower it would be great, and you could just ignore everything except the center line and get lots of time for image processing and control.

indrekluuk commented 6 years ago

I did some measuring.

Interrupt code is only one line that takes only one cpu cycle: PORTD = 0b00000000; // sets pins 0..7 to LOW

With Arduino framework's "attatchInterrupt" it takes from input pin changes to output change 3.5us 3.5/0.0625 = 56 CPU cycles;

using directly this: ISR (PCINT0_vect) { PORTD = 0b00000000; } It takes ~1.2us 1.2/0.0625 = 19,2. Around 18 to 20 cycles (my measurement is not very accurate)

OV7670 has register 2A - "dummy pixel insert". A couple of dummy pixels in front of a line should compensate for the interrupt delay.

indrekluuk commented 6 years ago

http://www.displayfuture.com/Display/datasheet/controller/ST7735.pdf It seems that the screen (ST7735) supports 4-4-4 color mode where two pixels can be transferred with three bytes instead of four (at the cost of reduced color space): st7735

This means that 15fps without interlacing should be possible. Maybe even in color. Currently pixel reading is slowed down by masking out raw input bits and then OR'ing them together into one byte. There is more free time during data transfer to screen. One possibility is to save raw data during read and then later clean it up.

indrekluuk commented 6 years ago

Now the 20Hz version is using interrupt to detect new line start. It is more stable and looks better than before.

The dummy pixel setting did something (the image changed), but it didn't work as I expected. But it doesn't actually matter since it is just a couple of pixel at the edge that are lost.

larsenglund commented 6 years ago

Nice, good finds! Have you tried out the RGB444 mode for 15fps color too?

indrekluuk commented 6 years ago

Not yet. I have some other stuff to do. But at some point I will try it.

larsenglund commented 6 years ago

Off-topic: I printed a case for my cam and added a lipo+charger and some infrared LED:s to turn it into a night vision camera for my 4.5yr old son - he loves it :) https://www.youtube.com/watch?v=J-ErxRT7Nbc

indrekluuk commented 6 years ago

Nice! Did you design a PCB for that or wired it up manually?

larsenglund commented 6 years ago

I didn't have the patience to wait for a PCB so I wired it up manually :)

kozuch commented 6 years ago

Looking at your conversation with YL Yang at https://youtu.be/Dp3RMb0e1eA - is the HREF usage implement already or not in this repo? And other "tweaks" in this thread?

indrekluuk commented 6 years ago

I have an example file of the interlaced version in the repo (ExampleGrayscale20HzInterlaced.cpp). It runs at 20Hz but with only half the vertical resolution and it is in gray scale.

I haven't tried the interrupt with HREF signal since the bottleneck for Arduino is reading pixels and that already is at maximum that Arduino Uno/Nano can handle.

I haven't had time to try the last idea of reducing the color space. I am not sure it is worth the time to do it. Arduino is already very close to its limits and the benefit from doing that is probably very small.

indrekluuk commented 5 years ago

Hey larsenglund!

There is now an easy way to get it working at 20 frames per second. If you search "WAVGAT Nano" from aliexpress then you can find an Arduino clone that uses WAVGAT chip instead of Atmel's. It is almost fully compatible with normal Atmel based Arduino Nano. There were some minor adjustments that I had to make to the code so it would run on both. WAVGAT chip can run at 32Mhz.

Here is a demo of it working:
https://www.youtube.com/watch?v=dbMKfxACAfg