Coding speed for 9/7 on 32bits platforms (x86/ARM) can be improved with a quick fix

xinxinlx / openjpeg

Automatically exported from code.google.com/p/openjpeg

Other

0 stars 0 forks source link

Coding speed for 9/7 on 32bits platforms (x86/ARM) can be improved with a quick fix #220

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

The patch proposed patch has been tested on trunk at revision 2343

Tested on :
Win XP SP3 x86 VC10 SP1
Linux CentOS 5.5 x86_64 compilation with -m32 (GCC 4.1.2 / Red Hat 4.1.2-48)
Linux Ubuntu 11.10 ARMEL compilation with -march=armv7-a -mfloat-abi=softfp 
-mfpu=neon -mtune=cortex-a9 (GCC 4.6.3 / Sourcery CodeBench Lite 2012.03-57)

Proposed patch does not require armv7 nor neon capabilities.

Overall time to compress Bretagne2.ppm, Cevennes1.bmp, 
X_4_2K_24_185_CBR_WB_000.tif using : "time ./opj_compress -ImgDir ./tmp/ 
-OutFor jp2 -I" showed a 10-15% speed-up

Regards,
Matthieu DARBOIS

Original issue reported on code.google.com by m.darb...@gmail.com on 15 Apr 2013 at 1:55

Attachments:

fix_mul.patch

GoogleCodeExporter commented 9 years ago

This of course an issue of type enhancement, but I didn't see how to create 
one...

Original comment by m.darb...@gmail.com on 15 Apr 2013 at 1:59

GoogleCodeExporter commented 9 years ago

Changed register constraints for ARM version. It enables to save (potentially) 
2 registers.

Original comment by m.darb...@gmail.com on 16 Apr 2013 at 11:13

Attachments:

fix_mul.patch

GoogleCodeExporter commented 9 years ago

Original comment by mathieu.malaterre on 25 Feb 2014 at 12:43

Added labels: Priority-Low
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

Hi,

I updated the patch for tag 2.1.0.

Please find some time ratios below. The whole encoding time is taken into 
account. Input images are 8bit grayscale images encoded using 9/7 wavelet. 
Timings include 8bit->32bit conversion.

0,964 (linux x86 gcc4.4)
0,983 (linux armv7 gcc4.6)
0,989 (linux armv5 gcc4.6)
0,918 (windows x86 vc8)
0,872 (windows x86 vc10)

x64 shows almost no improvement (as expected, less than 1%)

Regards,
Matthieu

Original comment by m.darb...@gmail.com on 28 May 2014 at 7:12

Attachments:

openjpeg-2.1.0-emul.patch

GoogleCodeExporter commented 9 years ago

Original comment by m.darb...@gmail.com on 18 Sep 2014 at 8:31

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

Given the results, I took a look at assembly & it looks like gcc & clang are 
doing their job so assembly is not needed for linux/macos x86 & arm.

The optimization is also true for MCT, even on x64 (got rid of a useless 
operation) where it's speed up by 40%

Original comment by m.darb...@gmail.com on 13 Dec 2014 at 10:10

Changed state: Started

GoogleCodeExporter commented 9 years ago

This issue was updated by revision r2956.

Original comment by m.darb...@gmail.com on 13 Dec 2014 at 10:27

GoogleCodeExporter commented 9 years ago

Still need to get VC 8+ optimization.

Original comment by m.darb...@gmail.com on 13 Dec 2014 at 10:28