joetrotta / openjpeg

Automatically exported from code.google.com/p/openjpeg
Other
0 stars 0 forks source link

Coding speed for 9/7 on 32bits platforms (x86/ARM) can be improved with a quick fix #220

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The patch proposed patch has been tested on trunk at revision 2343

Tested on :
Win XP SP3 x86 VC10 SP1
Linux CentOS 5.5 x86_64 compilation with -m32 (GCC 4.1.2 / Red Hat 4.1.2-48)
Linux Ubuntu 11.10 ARMEL compilation with -march=armv7-a -mfloat-abi=softfp 
-mfpu=neon -mtune=cortex-a9 (GCC 4.6.3 / Sourcery CodeBench Lite 2012.03-57)

Proposed patch does not require armv7 nor neon capabilities.

Overall time to compress Bretagne2.ppm, Cevennes1.bmp, 
X_4_2K_24_185_CBR_WB_000.tif using : "time ./opj_compress -ImgDir ./tmp/ 
-OutFor jp2 -I" showed a 10-15% speed-up

Regards,
Matthieu DARBOIS

Original issue reported on code.google.com by m.darb...@gmail.com on 15 Apr 2013 at 1:55

Attachments:

GoogleCodeExporter commented 9 years ago
This of course an issue of type enhancement, but I didn't see how to create 
one...

Original comment by m.darb...@gmail.com on 15 Apr 2013 at 1:59

GoogleCodeExporter commented 9 years ago
Changed register constraints for ARM version. It enables to save (potentially) 
2 registers.

Original comment by m.darb...@gmail.com on 16 Apr 2013 at 11:13

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by mathieu.malaterre on 25 Feb 2014 at 12:43

GoogleCodeExporter commented 9 years ago
Hi,

I updated the patch for tag 2.1.0.

Please find some time ratios below. The whole encoding time is taken into 
account. Input images are 8bit grayscale images encoded using 9/7 wavelet. 
Timings include 8bit->32bit conversion.

0,964 (linux x86 gcc4.4)
0,983 (linux armv7 gcc4.6)
0,989 (linux armv5 gcc4.6)
0,918 (windows x86 vc8)
0,872 (windows x86 vc10)

x64 shows almost no improvement (as expected, less than 1%)

Regards,
Matthieu

Original comment by m.darb...@gmail.com on 28 May 2014 at 7:12

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by m.darb...@gmail.com on 18 Sep 2014 at 8:31

GoogleCodeExporter commented 9 years ago
Given the results, I took a look at assembly & it looks like gcc & clang are 
doing their job so assembly is not needed for linux/macos x86 & arm.

The optimization is also true for MCT, even on x64 (got rid of a useless 
operation) where it's speed up by 40% 

Original comment by m.darb...@gmail.com on 13 Dec 2014 at 10:10

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2956.

Original comment by m.darb...@gmail.com on 13 Dec 2014 at 10:27

GoogleCodeExporter commented 9 years ago
Still need to get VC 8+ optimization.

Original comment by m.darb...@gmail.com on 13 Dec 2014 at 10:28