qix67 / uVGA

uVGA is a 100% hardware VGA driver for teensy 3.6, 3.2 (& probably 3.5)
GNU General Public License v3.0
54 stars 8 forks source link

uVGA library

by Eric PREVOTEAU

1 Quick start

Import zip file using Arduino IDE library manager or place this folder in your libraries directory and start or restart the Arduino IDE

1.1 Wiring

for more accurate colors, replace 2k2 by 2k and 470R by 510R

1.2 Basic usage

#include <uVGA.h>

uVGA uvga;

#define UVGA_DEFAULT_REZ
#include <uVGA_valid_settings.h>

void setup()
{
    uvga.begin(&modeline);
}

This will use the default resolution set for your CPU frequency.

You can choose a different modeline by changing the define. Accepted values are in uVGA_valid_settings.h

Then call any drawing or text functions.

For text, use uvga.print(...) and uvga.println(...) just like Serial.print(...) and Serial.println(...)

1.3 Basic usage with preallocated frame buffer

#include <uVGA.h>

uVGA uvga;

#define UVGA_DEFAULT_REZ
#include <uVGA_valid_settings.h>

UVGA_STATIC_FRAME_BUFFER(uvga_fb);

void setup()
{
    uvga.set_static_framebuffer(uvga_fb);
    uvga.begin(&modeline);
}

UVGA_STATIC_FRAME_BUFFER macro creates a frame buffer named uvga_fb, stored in DMAMEM area and uses UVGA_HREZ, UVGA_VREZ, UVGA_RPTL #define created in uVGA_valid_settings.h

2 Colours

color format is RGB332 (RRRGGGBB)

3 API

class instantiation

Initialize class internal parameters.

Class requirements:

  • 4 DMA channels (0-15), can be any.
  • 1 FTM with 4 channels (on teensy 3.6, only FTM0 and FTM3 are possible. on teensy 3.2, FTM0 only). All others channels will be free.

If the first 3 parameters (DMA channels) are set to 0, DMA channels will be allocated using DMAChannel library.

On the chosen FTM, library will use channels:

  • hsync_ftm_channel_num
  • hsync_ftm_channel_num+1
  • x1_ftm_channel_num
  • x1_ftm_channel_num+1

hsync_ftm_channel_num and x1_ftm_channel_num MUST be even (0, 2, 4, 6)

The pair (hsync_ftm_num, hsync_ftm_channel_num) defines the pin generating Hsync signal. Look at teensyduino's teensy3/core_pins.h file, CORE_FTM*_CH*_PIN

On teensy 3.6, valid pairs (not on port D) are:

  • (0,0) => pin 22
  • (0,2) => pin 9. This FTM can use pin 13 but you will have to configure it yourself and problem may occur due to LED on this pin.
  • (3,4) => pin 35
  • (3,6) => pin 37. This FTM can use pin 57 but you will have to configure it yourself.

On teensy 3.2, valid pairs (not on port D) are:

  • (0,0) => pin 22
  • (0,2) => pin 9. This FTM can use pin 22 but you will have to configure it yourself.

    1 or 3 DMA will be used to generate video signal, the 4th one is (will be) used to accelerate some drawing functions. Currently, only 2 functions support DMA acceleration (HLine & VLine) however, unlike the first 3 DMA channels which never runs at the same time, this one can run at any time and drawing a long line greatly disturb the other DMA channels thus DMA acceleration is currently disabled

If any parameter is invalid, the library will fallback to its default value.

Force library to use a dedicated frame buffer instead of letting it allocates the frame buffer itself.

If used, this function MUST be called before uvga.begin call.

Frame buffer can be easily created using a macro one line:

__UVGA_STATIC_FRAME_BUFFER(your_frame_buffer_name_here);__

Start a DMA channel automatically when a specific location is reached on screen.

If used, this function MUST be called before uvga.begin call.

It is possible to trigger multiple events but only one DMA channel can be trigger by one event.

Possible locations are:

  • UVGA_TRIGGER_LOCATION_END_OF_DISPLAY_LINE

when HSync occurs, whenever line has pixels or not.

  • UVGA_TRIGGER_LOCATION_END_OF_VGA_IMAGE

immediately after the last pixel of last image line.

  • UVGA_TRIGGER_LOCATION_START_OF_VGA_IMAGE

before the first pixel of first image line. Warning the coordinates of the trigger is NOT x = -1, y = 0 (the "pixel" on the left of the first pixel of the image) but x = 0, y = -1 (the "pixel" above the first pixel of the image) which is technically the last time the VSync TCD is called.

  • UVGA_TRIGGER_LOCATION_START_OF_DISPLAY_LINE

immediately after the beam moved to the left. Occurs whenever line has pixel or not.

Usage restriction:

  • UVGA_TRIGGER_LOCATION_END_OF_DISPLAY_LINE

It depends on Hsync polarity. If polarity is positive, it occurs at position modeline.hsync_start. If polarity is negative, it occurs at modeline.hsync_end. This trigger generates a DMAMux event linked to FTM = hsync_ftm_num, channel = hsync_ftm_channel_num

  • UVGA_TRIGGER_LOCATION_END_OF_VGA_IMAGE

supported in most modeline configurations. Exception when:

  • UVGA_DMA_AUTO + frame buffer does not fit totally in SRAM_L + repeat_line > 2. In this mode, DMA channel linking from 1st and 2nd DMA channel are already used. The channel linking of the 3rd DMA channel is used in this case. The trigger occurs a bit later than in all other configuration but it should works properly
  • Not available if frame buffer fit totally in SRAM_L + repeat_line = 1 + vertical resolution > 511.
  • Not available if UVGA_DMA_AUTO + frame buffer does not fit totally in SRAM_L + repeat_line = 1 + first line in SRAM_U is > 511.

The 2 modes where this trigger is not available should not be a problem as they only occurs if horizonal resolution is ridiculously small (<128 and ~ <200 pixels/line)

  • UVGA_TRIGGER_LOCATION_START_OF_VGA_IMAGE

supported in all modeline configurations

  • UVGA_TRIGGER_LOCATION_START_OF_DISPLAY_LINE

supported in all modeline configurations. This trigger generates a DMAMux event linked to FTM = hsync_ftm_num, channel = x1_ftm_channel_num + 1

Manually modifiying DMAMux configuration to route the DMAMux event to multiple DMA channels will result in unpredictable behavior (see Kinetis Reference Manual, 23.4.1 Channel Configuration register DMAMUX_CHCFGn). The same result can be obtained by using several DMA channels linked together.

In standard case, uvga.begin immediatly start image generation. However, in some case, it may be required to delay this start to perform additionnal tasks.

If used, this function MUST be called BEFORE uvga.begin call. Later, uvga.clocks_start() must be called to start pixel DMA.

Start image generation. If uvga.disable_clocks_autostart() was not called, there is no need to call this function else this function MUST be called AFTER uvga.begin()

Initialize the display

Returns: 0 on success, uvga_error_t code on failure

Not all resolutions work on all monitors.

see below for uVGAmodeline description

Stop the display. NOT TESTED

Fills screen with color col, or black if col is not specified.

Retrieve the width and the height of the frame buffer. Width and height of the frame buffer are computed from modeline settings.

Return the color of the pixel at (x,y)

Draw pixel at (x,y) in colour col

Draw line from (x0,y0) to (x1,y1) in colour col

Draw or fill triangle (x0,y0),(x1,y1),(x2,y2) in colour col

Draw or fill rectangle with corners (x0,y0),(x1,y1) in colour col

Draw or fill circle center (x,y) radius r in colour col

Draw or fill ellipse bounded by rectangle (x0,y0),(x1,y1) in colour col

Draw text at any pixel position.

(x,y) is the top-left corner of the text before rotation.

fgcol is the colour of the text. bgcol is the colour of the text background or -1 for a transparent background.

dir is the direction of the text.

  • UVGA_DIR_RIGHT is left to right,
  • UVGA_DIR_TOP is bottom to top,
  • UVGA_DIR_LEFT is right to left,
  • UVGA_DIR_BOTTOM is top to bottom,

Scroll an area of the screen, top left corner (x,y), width w, height h by (dx,dy) pixels. If dx>0 scrolling is right, dx<0 is left. dy>0 is down, dy<0 is up. Empty area is filled with color col (only when horizontal (dy=0) or vertical scroll (dx=0))

Copy image area from position (s_x, s_y) to position (d_x, d_y).

Area is w pixels width and h pixels height.

The function supports overlapping area and off-screen. If source area is fully off-screen, nothing occurs. If source area is partially off-screen, w and h will be automatically adjusted to fit fully in screen.

Draw an out of screen bitmap (size: bitmap_width * bitmap_height pixels) on screen at position (x_pos, y_pos).

The bitmap must have the same color mode as the modeline. The function will automatically clip the bitmap if it should be copied partially out of frame buffer.

Move the print position to (column, line)

Restrict the printing window to an area of width x height characters, at a position (x,y) (in pixel).

Restore the print window to be the whole screen.

Clear the print window to the current text background colour.

Scroll the print window up one line and moves the print position to the bottom.

Set the text colour to fg_color (RGB332).

Set the text background colour to bg_color (RGB332) or -1 for transparent background

These functions are similar to Serial.write, except output gets printed to the screen. These functions enable print and println to work correctly.

These functions wait for the beam position to be off-screen. waitBeam will return immediately if the beam is already off-screen, waitSync will always wait for the next frame. These can be used to reduce flicker.

4 modeline

Used resolution is described using a modeline stored in an uVGAmodeline structure. Most of the data can be obtain from a standard modeline for Xorg server.

Standard modeline format is:

ModeLine "640x480" 25.20 640 656 752 800 480 490 492 525 -HSync -VSync

which means

The modeline structure contains the following fields:

Some additionnal settings allow fine tuning of the video mode

5 Miscellanous informations

This will dump a huge amount of data regarding FTM and TCD.

A line uses (hres + 1 + 0xF) & ~0xF bytes. +1 comes from the black pixel added at the end of each line. UVGA_FB_ROW_STRIDE macro computes this automatically

6 How it works

A lot of magic... 1 FlexTimer (2 channels in complementary mode + 2 channels in PWM mode) and 1 (or 3) DMA channel

The FlexTimer uses 2 combined channels in complementary mode (asymetric PWM) to create HSync signal at the correct time. On the same FlexTimer, a 3rd channel (named X1 here) is used to start the DMA at the correct time on each line.

The DMA generates both image and VSync signal and once started never stops. The DMA uses a set of TCD (transfer control descriptor) linked together using DMA scatter/gather mode. The last TCD is linked to the first one thus the DMA never has nothing to do. Each TCD describes a line to process and is started by X1 FTM channel (the 3rd channel). The 4th channel (X1 FTM channel + 1) is not used by the library itself but generates a "new display line" event on DMAMux.

On a normal line, DMA copies as fast as possible all pixels of the line + 1 (1 black pixel is added at the end of each line to power off RGB pins). After the image, 3 TCD are used to set VSYNC signal properly, 1 before the vsync to wait it (may not exist depending on modeline), 1 at the beginning of vsync to set it and 1 after the vsync to clear it. Vsync TCD uses nearly no ressource because they copy 4 bytes only on each line.

It is not possible to fine tune DMA copy speed. Because it has the highest priority, it starts at the correct time (nearly every time :) )

I tried to adjust speed of DMA using FTM and PIT but it slowed down far too much and is not accurate due to round error (PIT@60MHz to obtain a 45MHz signal gives a FTM modulo of 1 with an error of 15Mhz). Moreover, additionnal wait cycles are added to start and stop DMA.

The 2 ways I found to slow down DMA is:

Due to the fact VGA is an analog signal, the width of each pixel is no really "defined". Using a 800x600 resolution, I successfully packed more than 1000 pixels on each line.

To improve video stability, pixel DMA channel has the highest priority among other DMA channels. Moreover, the crossbar switch is configured to give DMA a maximal priority to SRAM backdoor and GPIO. To gain one more cycle, SRAM backdoor port and GPIO port are parked to DMA when they are not in used. This gives a huge boost in performance. Finally, SRAM backdoor is configured to give absolute priority to DMA on SRAM_L and favor DMA on SRAM_U.

Everything is not perfect. Due to the fact pixel duration is approximated, the monitor may or may not totally understand what it receives and with some colors, pixels may be blurry (VGA like :) )

A last problem comes from SRAM. SRAM is splitted in 2, SRAM_L and SRAM_U. SRAM_L is accessed using CODE bus and has 0 wait state. SRAM_U is accessed using system bus and has at least 1 wait state. Unfortunately, the biggest part of the SRAM is SRAM_U.

To fix this problem, a 2nd DMA channel is used. For all lines located in SRAM_U, a copy will be performed to bring them back in SRAM_L before displaying them. This channel will be started using TCD dma channel link of the previously displayed line. In case repeat line factor is bigger than 1, to reduce bandwidth usage, the copy will happen only on new line, not on its duplicates.

However, this 2nd channel triggers a new problem. It is not possible for a TCD to modify destination address between TCD minor loop. This problem does not exist in the 1st channel because the destination address is always the same address. To bypass this problem, after TCD minor loop of the 2nd channel is processed, it triggers a start on the 3rd channel. TCD of this 3rd channel will simply reset the value of destination address (yes, the DMA reprograms one of its register itself :) ).

All these copies waste a bit of RAM bandwidth but the 2nd DMA channel copies are performed using burst mode and all these DMA TCD and DMA channel are automatically processed by the DMA engine without any CPU help.

Note: if all frame buffer lines are in SRAM_L, only the first DMA will be used

Finally, these system works perfectly... as long as nothing disturb it.

Hsync position is always correct because FTM channels cannot be bothered by anything.

Vsync position is roughly correct. Even if DMA starts a bit late, due to signal duration, it does not seem to disturb monitor (at least my old LCD 17").

The main problem is pixel generation. If the DMA is delayed, minor line oscillation can be visible. Due to the high priority, DMA seems to always obtain access to SRAM before CPU. The only thing which seems to delay DMA channel start is... the DMA itself. If another DMA channel performing a "long" transfer, despite having a lower priority, it delays pixel DMA channel. Performing various data copies using CPU is OK. A simple Serial.print is not OK.