google / etc2comp

Apache License 2.0
376 stars 120 forks source link

how to improve compression speed #37

Open yanliasdf789 opened 6 years ago

yanliasdf789 commented 6 years ago

I test a 1024*1024 png image, the project run fine , the compresion formate is RGB8, file formate is pkm, But it cost 8 senconds. But now i have 100000 image to do like this, so how to improve the speed. I use the "mali Texture compresion tool v4.3.0 " with fast mode , it only cost 1.7 sconds.

tommego commented 5 years ago

The same problem I got, I have hundred thousand of images to work, and it is very hard to speed up the work of compression...

alecazam commented 3 years ago

This library seem to treat memory quite liberally just to support multithreading. All inputs from an LDR are float4, which for sRGB or premultiplied LDR isn't that bad, but could be around 11 or 12 bits per channel.

Then on top of that multiple block and encoding elements are allocated that include large amounts of data. I'm running single threaded, and even for R/RG11 conversion of a 256x256 mipped texture in release, the timings are 12s at quality 50, and 4s at 49. This is usually done off a table lookup for all other LDR 1/2 channel formats I've seen in a few milliseconds on a single thread. I don't quite understand the performance tradeoffs made in this library.

alecazam commented 3 years ago

This seems to be the hotspot now that I'm reusing the same block and encoder.

void Block4x4Encoding_R11::CalculateR11(unsigned int a_uiSelectorsUsed, 
                                                float a_fBaseRadius, float a_fMultiplierRadius)
{
....
   void Block4x4Encoding_R11::CalculateR11(unsigned int a_uiSelectorsUsed, 
    float a_fBaseRadius, float a_fMultiplierRadius)
{
....
    for (float fMultiplier = fMinMultiplier; fMultiplier <= fMaxMultiplier; fMultiplier += 1.0f)
    {
        // find best selector for each pixel
        unsigned int auiBestSelectors[PIXELS];
        float afBestRedError[PIXELS];
        float afBestPixelRed[PIXELS];

        // TODO: this brute force loop does 16 x 8 = (256 calls x multiplier x base) x blocks
        // to CalcPixelError that results in 2.4s/3.9s of time spent in CalcR11, +G11 doubles that time.
        // CalcPixelError returns dx^2 + dy^2  in  my impl.

        for (unsigned int uiPixel = 0; uiPixel < PIXELS; uiPixel++)
        {
            float fBestPixelRedError = FLT_MAX;

            for (unsigned int uiSelector = 0; uiSelector < SELECTORS; uiSelector++)
            {
                float fPixelRed = DecodePixelRed(fBase * 255.0f, fMultiplier, uiTableEntry, uiSelector);

                ColorFloatRGBA frgba(fPixelRed, m_pafrgbaSource[uiPixel].fG,0.0f,1.0f);

                float fPixelRedError = CalcPixelError(frgba, 1.0f, m_pafrgbaSource[uiPixel]);

                if (fPixelRedError < fBestPixelRedError)
                {
                    fBestPixelRedError = fPixelRedError;
                    auiBestSelectors[uiPixel] = uiSelector;
                    afBestRedError[uiPixel] = fBestPixelRedError;
                    afBestPixelRed[uiPixel] = fPixelRed;
                }
            }
        }
richgel999 commented 2 years ago

This library appears dead.

Calinou commented 2 years ago

For future readers, using a library like https://github.com/wolfpld/etcpak should provide faster compression. See this benchmark: https://aras-p.info/blog/2020/12/08/Texture-Compression-in-2020/