lifthrasiir / j40

J40: Independent, self-contained JPEG XL decoder
Other
231 stars 3 forks source link

Performance on .fjxl #11

Open alantudyk opened 1 year ago

alantudyk commented 1 year ago

Test image: https://stsci-opo.org/STScI-01GA76Q01D09HFEV174SVMQDMV.png

$ time ./dj40 w.fjxl 
14560x8418 frame read and discarded.

real    0m16,854s
user    0m16,325s
sys     0m0,549s

8 MPx/s, PNG decoding speed on the same CPU is 50 MPx/s. Too slow for just a prefix_codes + simple_avg_predictor + color_conversion.

alantudyk commented 1 year ago

Also, the width is incorrect (must be 14557):

$ time ./djxl w.fjxl --num_threads=1
JPEG XL decoder v0.7.0 3a4676f [AVX2,SSE4,SSSE3,Emu128]
Read 145997782 compressed bytes.
No output file specified.
Decoding will be performed, but the result will be discarded.
Decoded to pixels.
14557 x 8418, 16.68 MP/s [16.68, 16.68], 1 reps, 1 threads.
Allocations: 489484 (max bytes in use: 4.114270E+09)

real    0m7,588s
user    0m6,990s
sys 0m0,600s
lifthrasiir commented 1 year ago

The performance issue is currently well known and there is a huge room for improvements. That said, yeah, specifically improving fjxl performance might be a good way to start that effort. (I think libjxl specializes MA tree decoding for fjxl so that may have made a huge difference.)

The incorrect size is due to the peculiarity of fjxl encoding; it always rounds width up to the next multiple of 8 or 16 (I can't recall), and relies on the crop rectangle to hide an excess bit. J40 currently doesn't implement crop rectangles, which is also documented.