Closed XuNeo closed 11 months ago
I'm trying to understand what the problem is with using the global decoder list:
Note that #4241 will unify the memory alignment and stride of all buffers related to drawing.
In your suggestion the the decoder list are stored in the lv_draw_unit_t
?
It can be a valid use case as the output of the 2 PNG decoders might not be compatible.
I thinks that's the use case. #4241 is a solution, that we take all reasonable limitations of GPU into consideration. Strides width, memory alignement, and memory flags.
@FASTSHIFT @terry0012 Could you comment more? Will #4241 be enough?
There is one limitation still not mentioned, but looks like we don't need seperated decoder list.
Premultiply RGB with alpha after decoding. vglite
for example, needs it. One possible solution is:
Current software rendere does't support premultiplied image. We can simply disable it just like tiled
image and treat it as GPU only format.
Do you think it's doable? Does premultiplied image benefit software rendere as well?
It can be useful in SW decoder as well, as we can save a multiplication during blending. I need to check if we can add it without too much code duplication.
To indicate if an image premultipled I'd would add
LV_COLOR_FORMAT_RGB565A8_PREMULTIPLIED,
LV_COLOR_FORMAT_ARGB8888_PREMULTIPLIED
What do you think?
If you agree with adding LV_COLOR_FORMAT_..._PREMULTIPLIED
I can add these and also update SW rendering.
For testing could you send some images as C array in premultiplied format for both ARGB8888 and RGB565A8?
Sorry for the late reply. I was thinking adding an extra uint32_t in image header for flags including:
- premultiplied: 1;
- tiled: 1; /*Image in tiled mode or linear mode*/
- rlecompressed: 1; /*Image data is RLE compressed*/
- reserved: 13; /*Reserved for lvgl use*/
- user_flags: 16; /*lvgl won't use it, use it for any purpose you want*/
Premultiply could also apply to LV_COLOR_FORMAT_I1/2/4/8
.
For testing could you send some images as C array in premultiplied format for both ARGB8888 and RGB565A8?
I'm devoloping a new image tool for lvgl. I can add premultiplied image support quickly once decided.
https://github.com/lvgl/lvgl/compare/master...XuNeo:lvgl:lvgl-image-tool?expand=1
Hi Gabor, I made some slight modification to the original lvgl image converter and put the output premultiplied files here.
Thank you @PeterBee97! I'm still fighting with the mem align and stride update, but I'll try this out right after that.
@XuNeo
I'm devoloping a new image tool for lvgl. I can add premultiplied image support quickly once decided.
Amazing! :rocket: Please don't forget to add an option to set the stride.
Having the converter in Python is fine for now, but in the end we need to make it in JS to allow hosting it on https://lvgl.io/tools/imageconverter
I was thinking adding an extra uint32_t in image header for flags including:
I'm not sure it's the best way to handle this. rlecompressed
and tiled
are not as standard things as premultiplied
. So supporting some RLE compression in SW render doesn't mean that the same RLE compression works for a GPU too.
What I recommend is reserving several LV_COLOR_FORMAT_...
s for custom formats. This way you can do
#define LV_COLOR_FORMAT_MY_SPECIAL_RLE LV_COLOR_FORMAT_CUSTOM_1
And your GPU will immediately see that this format only for it, and not for SW or any other generic decoder.
What do you think?
rlecompressed
andtiled
are not as standard things aspremultiplied
.
We can remove tiled
from lvgl
since we haven't seen the benefits now(potential faster rotation speed).
I think RLE can make lvgl
much more friendly for low end MCUs by reducing flash size dramatically. See statistics here.
RLE compress method can vary between GPUs, but the one lvgl
software renderer uses won't need to be supported by other GPUs. For now, we only see Renesas uses it.
The suggestion to adding one additional uint32_t
comes from the fact that those bits can combinate with cf
. The method you mentioned will ends with more cf
s to be defined:
LV_IMG_CF_RGBA8888_PREMULTIPLIED,
LV_IMG_CF_RGB565A8_PREMULTIPLIED,
LV_IMG_CF_INDEXED_1BIT_PREMULTIPLIED,
LV_IMG_CF_INDEXED_2BIT_PREMULTIPLIED,
LV_IMG_CF_INDEXED_4BIT_PREMULTIPLIED,
LV_IMG_CF_INDEXED_8BIT_PREMULTIPLIED,
If you also agree to add RLE
, then cf
s will double again.
Beside, for flags like tiled
, we need a bit reserved for user. Adding it to image header looks more unified.
I also agree with supporting RLE but it seems more like an image format (similar to PNG) and not a color format. So we can develop a simple lvgl-rle
format, e.g.
"lvgl-rle"
I also agree with supporting RLE but it seems more like an image format (similar to PNG)
That can work.
In v8, there's decoder callback read_line
, RLE can be used to compress everyline, and decompress on read. I'm not sure if it will be removed. The decoder I impemented in v8 doesn't support compress by line, but it's doable.
For tiled
etc. flags, where do you think is better to place? Will lv_img_decoder_dsc_t
work?
In v8, there's decoder callback read_line, RLE can be used to compress everyline, and decompress on read. I'm not sure if it will be removed. The decoder I impemented in v8 doesn't support compress by line, but it's doable.
I think it'd be useful to keep it, but read_line
will be replaced by get_area
to have a more generic API. So it can get multiple lines, tiles or whatever the decoder wants.
For tiled etc. flags, where do you think is better to place? Will lv_img_decoder_dsc_t work?
Tiling is also not a color format, but rather an image format (how the pixels are stored). As tiling can be also different in GPUs and standards, I think we cannot add generic support for it. In the feat/draw_buf
branch (related to mem. align and stride) I already updated lv_img_header_t
. It's 6 bytes now, but obviously we can make it 8 bytes and some bytes for custom formats. But as it will be specific for a given GPU or decoder I'd keep it generic and up to the user how to handle it. E.g.
typedef struct {
uint8_t always_zero;
uint8_t cf; /*Color format: See `lv_color_format_t`*/
uint8_t format;
uint8_t reserved;
uint16_t w; /*Width of the image*/
uint16_t h; /*Height of the image*/
} lv_img_header_t;
Actually in format
we can store some standard formats too, e.g. LV_IMG_FORMAT_PNG/JPG/BMP/GIF
etc.
Tiling is also not a color format, but rather an image format (how the pixels are stored).
Exactly. The same is true for premultiplied
. So using an image format flags is better than IMAGE_FORMAT enum
I think. Because an image can be both tiled and premultiplied.
typedef struct {
uint8_t always_zero;
uint8_t cf; /*Color format: See `lv_color_format_t`*/
//uint8_t format;
uint8_t premultiplied:1;
uint8_t reserved: 7;
//uint8_t reserved;
uint8_t userflag; /* can be tiled:1, reserved:7 for some GPU */
uint16_t w; /*Width of the image*/
uint16_t h; /*Height of the image*/
} lv_img_header_t;
reserved
is not same as userflag
, so I also suggest add specific field for user and reserved field for future lvgl use.
Actually in
format
we can store some standard formats too, e.g.LV_IMG_FORMAT_PNG/JPG/BMP/GIF
etc.
I can't see the benefits now. For those images (file or c array), it doesn't have lv_img_header_t
.
Edit: user
--> userflag
If INDEXED
is a color format, I think premultiplied can be considered a color format too. Compared to tiled, premultiplied is standard (there is only one way to do premultiplication) and affects only a few color formats.
To put it differently: I think premultiplied is a color format not an image format is because it's local to a single pixel. While tiled or RLE describes the whole image.
In the practice when pass a buffer to a function (e.g. sw_transform) we also need to tell its color format. Is we stored pre-multipled as a new bit we needed to have a new parameter for the "modifiers".
So I like the idea of having LV_COLOR_FORMAT_..._PREMULTIPLIED
.
Actually in format we can store some standard formats too, e.g. LV_IMG_FORMAT_PNG/JPG/BMP/GIF etc.
I can't see the benefits now. For those images (file or c array), it doesn't have lv_img_header_t.
E.g. for PNG as array we need to have lv_img_dsc_t
. See here .
reserved is not same as userflag, so I also suggest add specific field for user and reserved field for future lvgl use.
True, so my updated proposal:
typedef struct {
uint8_t always_zero;
uint8_t cf; /*Color format: See `lv_color_format_t`*/
uint8_t format; /*Raw pixels, LVGL-RLE, PNG, JPG, Tiled, etc*/
uint8_t user_flags; /* can be tiled:1, reserved:7 for some GPU */
uint16_t width; /*Width of the image*/
uint16_t height; /*Height of the image*/
} lv_img_header_t;
Hi Gabor,
My thought is to keep always_zero
field same as v8. The bit length used for format
and user
can be tweaked.
typedef struct {
uint32_t cf : 5; /*Color format: See `lv_color_format_t`*/
uint32_t always_zero : 3; /*It the upper bits of the first byte. Always zero to look like a
non-printable character*/
uint32_t format:8; /*Image format? To be defined by LVGL*/
uint32_t user:8;
uint32_t reserved: 8; /*Reserved to be used later*/
uint32_t w: 16;
uint32_t h: 16;
} lv_img_header2_t;
} lv_img_header_t;
The image header is updated here : 4fe593363
get_area
is added in https://github.com/lvgl/lvgl/tree/feat/img-decoder-get-area
See lv_bmp as an example. I'm not perfectly happy with it yet, but you can see the idea.
We need some feedback on this issue.
Now we mark this as "stale" because there was no activity here for 14 days.
Remove the "stale" label or comment else this will be closed in 7 days.
Not stale.
Image basic info is firstly inspected before dispatching to draw units. Now the issue becomes when to decode the image and should we use the same decoder list for all draw units. How to use cache etc.
Need more time, since cache hasn't be added back to software draw unit.
We need some feedback on this issue.
Now we mark this as "stale" because there was no activity here for 14 days.
Remove the "stale" label or comment else this will be closed in 7 days.
Can we implement serial decoding of decoders? The PNG decoder is responsible for processing the PNG file into an ARGB8888 image. The GPU decoder then processes the ARGB8888 image into a memory and stride-aligned image (and releases the memory requested by the PNG decoder), and then processes the image. Perform alpha premultiplication, and the final cache only retains images processed by the GPU decoder.
Can we refer to the design of FFmpeg's AVFilterGraph, which takes into account complex serial connections. We can imagine the decoder and cache as an object that can be input and output, and can be flexibly connected together.
Are the "stride adjuster" and the "alpha premultiplication" are real decoders? Can't we say they are just helpers functions?
E.g. in my_gpu_draw_img
stride_realign()
or an alpha premultiplication
function.There is one catch though. Now we can find the images in the cache by a pointer (usually an lv_image_dsc_t
) or string (usually file name). Let say it was a file, so in the cache it was stored by name. What do we do in my_gpu_draw_img
?
Check if "filename" is cached? If yes, how do we know if it's the output of the PNG decoder or our modified data?
lv_cache_find
has a param1
and a param2
parameter. I though we can use them to store the frame_id
and the color
(for A8) but let's say in param1
we could store an ID like STRIDE_ALIGNED
and try finding the cache with param1 = STRIDE_ALIGNED
. So it would look like this:
my_gpu_draw_img
we would like to execute both STRIDE_ALIGNED
and PREMULTIPLY
lv_cache_find(src, src_type, STRIDE_ALIGNED | PREMULTIPLY)
. If found we are happy, and can start the drawinglv_image_decoder_open
lv_cache_find(src, src_type, 0, 0)
, if not found create an entry nowparam1
in the cache entryIMO it might work, but instead of param1
and param2
we need a more flexible way to differentiate cache entries as in reality we can't use param1
for this as it might be used for frame_id
. But it's an other questions.
In principle do you agree with this approach, or you can imagined is differently?
@kisvegabor
Do you mean to do post-processing in the image cache? Is it more appropriate to implement post-processing logic at the lv_draw_buf
layer?
We can implement the lv_draw_buf_post_processing
method, and users can adjust buf parameters and content themselves. Users can call lv_draw_buf_post_processing
every time before drawing in their own draw unit, and the internal implementation of lv_draw_buf_post_processing
can use flags to mark the processing status of buf. If it has been processed, it can be returned directly.
This solution will not affect the image cache logic, because lv_draw_buf only cares about memory and not about src. Both parties can concentrate on their own affairs.
What if we handle this post processing in the decoders by having a global lv_image_decoder_post_process
function which is called after any decoder_open
?
What if we handle this post processing in the decoders by having a global
lv_image_decoder_post_process
function which is called after anydecoder_open
?
I agree with this.
I'm still trying to wrap my head around this feature and wondering if it's enough to have a global lv_image_decoder_open_post_process
callback which is called after all decoding?
Similarly we might need an lv_image_decoder_open_pre_process
callback too, to be sure we return the right entry from the cache. E.g. a simple pixel map image doesn't check the cache at all, so the processed image won't be used by default.
I'm still trying to wrap my head around this feature and wondering if it's enough to have a global
lv_image_decoder_open_post_process
callback which is called after all decoding?Similarly we might need an
lv_image_decoder_open_pre_process
callback too, to be sure we return the right entry from the cache. E.g. a simple pixel map image doesn't check the cache at all, so the processed image won't be used by default.
We can use flags
in the lv_draw_buf_t
to record and mark the processing status.
Just to be sure I understand the intention: the goal is to allow a platform specific customization on the decoded data, right?
Just to be sure I understand the intention: the goal is to allow a platform specific customization on the decoded data, right?
Yes, the goal is to have all decoded images meet the alignment and premultiplication requirements of the GPU.
In this case a global callback should really do the job. Do you agree?
In this case a global callback should really do the job. Do you agree?
Yes, I think global callback is enough for the task.
Great, could you send a PR?
Great, could you send a PR?
No problem. Over here: https://github.com/lvgl/lvgl/pull/4636
I've added a process_flag
here to record the processing state to avoid duplicate processing, what do you think?
We need some feedback on this issue.
Now we mark this as "stale" because there was no activity here for 14 days.
Remove the "stale" label or comment else this will be closed in 7 days.
Introduce the problem
Current decoders are organized in a global list including bin, png, gif etc. For GPU, not only the decoded image needs to meet some requirements(stride, address alignment etc.), but GPU also needs to register special decoders(ETC etc.).
Examples and cases
Suggested solution
@terry0012 suggests to add a decoder list to every GPU(sw renderer included), so when GPU needs to draw an image, it firstly decodes it from supported decoders. @FASTSHIFT implemented this idea and code looks cleaner.