watery01 / libyuv

Automatically exported from code.google.com/p/libyuv
0 stars 0 forks source link

5ms target for Neon #150

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
All _Any conversions should be 5ms or better

./runyuv10 Any
sudo LIBYUV_REPEAT=1000 nice --5 ./libyuv_unittest --gtest_filter=*Any | sed 
's/\(.*(\)\([0-9]*\)\( ms)\)/\2 - \1\2\3/g' | sort -rn | grep ms

I420ToI444_Any (22755 ms)
BayerRGGBToI420_Any (9261 ms)
BayerGBRGToI420_Any (9229 ms)
BayerBGGRToI420_Any (9171 ms)
BayerGRBGToI420_Any (9143 ms)
I411ToI420_Any (6455 ms)
I420ToI411_Any (6341 ms)
ARGBToNV12_Any (5914 ms)
ARGBToNV21_Any (5872 ms)
ABGRToI420_Any (5727 ms)
BGRAToI420_Any (5694 ms)
RGBAToI420_Any (5692 ms)
NV12ToRGB565_Any (4463 ms)
NV21ToRGB565_Any (4457 ms)
BayerGBRGToARGB_Any (4435 ms)
RAWToI420_Any (4404 ms)
RGB24ToI420_Any (4381 ms)
BayerRGGBToARGB_Any (4379 ms)
BayerGRBGToARGB_Any (4348 ms)
BayerBGGRToARGB_Any (4334 ms)
I420ToARGB1555_Any (4300 ms)
I420ToRGB565_Any (4126 ms)
I444ToARGB_Any (4079 ms)
I420ToBayerGRBG_Any (4069 ms)
UYVYToARGB_Any (4023 ms)
I420ToBayerGBRG_Any (4014 ms)
I420ToBayerRGGB_Any (4011 ms)
I420ToBayerBGGR_Any (4010 ms)
YUY2ToARGB_Any (4007 ms)
ARGBToUYVY_Any (3991 ms)
ARGBToYUY2_Any (3984 ms)
I411ToARGB_Any (3965 ms)
NV12ToARGB_Any (3696 ms)
NV21ToARGB_Any (3690 ms)
ARGBToI444_Any (3689 ms)
I422ToBGRA_Any (3686 ms)
I422ToABGR_Any (3682 ms)
I422ToARGB_Any (3670 ms)
ARGB1555ToI420_Any (3666 ms)
I422ToRGBA_Any (3660 ms)
RGB565ToI420_Any (3567 ms)
I420ToARGB4444_Any (3541 ms)
V210ToI420_Any (3540 ms)
I420ToRAW_Any (3521 ms)
I420ToBGRA_Any (3500 ms)
I420ToABGR_Any (3487 ms)
I420ToARGB_Any (3480 ms)
I420ToRGBA_Any (3475 ms)
ARGBToI422_Any (3466 ms)
ARGB4444ToI420_Any (3386 ms)
I420ToRGB24_Any (3385 ms)
ARGBToI411_Any (3189 ms)
I444ToI420_Any (3080 ms)
ARGBToI420_Any (3051 ms)
YToARGB_Any (2997 ms)
ARGBToARGB1555_Any (2948 ms)
ARGBToARGBMirror_Any (2859 ms)
ARGBToRGB565_Any (2584 ms)
I420ToI420Mirror_Any (2542 ms)
I422ToI420_Any (2426 ms)
ARGBToARGB4444_Any (2130 ms)
ABGRToARGB_Any (2076 ms)
ARGBToRAW_Any (2001 ms)
ARGBToABGR_Any (1984 ms)
BGRAToARGB_Any (1978 ms)
RGBAToARGB_Any (1977 ms)
ARGBToBGRA_Any (1977 ms)
ARGBToRGBA_Any (1970 ms)
ARGBToARGB_Any (1969 ms)
ARGB1555ToARGB_Any (1897 ms)
ARGBToRGB24_Any (1859 ms)
ARGBToI400_Any (1835 ms)
ARGBToBayerGBRG_Any (1827 ms)
RGB565ToARGB_Any (1809 ms)
YUY2ToI422_Any (1762 ms)
ARGB4444ToARGB_Any (1731 ms)
ARGBToBayerGRBG_Any (1708 ms)
ARGBToBayerBGGR_Any (1703 ms)
ARGBToBayerRGGB_Any (1702 ms)
UYVYToI422_Any (1682 ms)
RAWToARGB_Any (1578 ms)
RGB24ToARGB_Any (1567 ms)
YUY2ToI420_Any (1478 ms)
UYVYToI420_Any (1467 ms)
I422ToYUY2_Any (1339 ms)
I420ToI422_Any (1307 ms)
I422ToUYVY_Any (1289 ms)
I420ToUYVY_Any (1170 ms)
I420ToI420_Any (1157 ms)
I420ToYUY2_Any (1127 ms)
I420ToNV12_Any (1113 ms)
NV21ToI420_Any (1087 ms)
NV12ToI420_Any (1075 ms)
I400ToARGB_Any (1075 ms)
I420ToNV21_Any (1056 ms)
I400ToI400Mirror_Any (880 ms)
I400ToI420_Any (655 ms)
I420ToI400_Any (630 ms)
I400ToI400_Any (571 ms)
99 tests from libyuvTest (337717 ms total)

Original issue reported on code.google.com by fbarch...@chromium.org on 9 Nov 2012 at 10:58

GoogleCodeExporter commented 9 years ago
Improved in r480
22555 - [       OK ] libyuvTest.I420ToI444_Any (22555 ms)
6379 - [       OK ] libyuvTest.I411ToI420_Any (6379 ms)
6286 - [       OK ] libyuvTest.I420ToI411_Any (6286 ms)
6108 - [       OK ] libyuvTest.BayerRGGBToI420_Any (6108 ms)
6082 - [       OK ] libyuvTest.BayerBGGRToI420_Any (6082 ms)
6070 - [       OK ] libyuvTest.BayerGBRGToI420_Any (6070 ms)
6039 - [       OK ] libyuvTest.BayerGRBGToI420_Any (6039 ms)
4429 - [       OK ] libyuvTest.NV21ToRGB565_Any (4429 ms)
4413 - [       OK ] libyuvTest.NV12ToRGB565_Any (4413 ms)
4403 - [       OK ] libyuvTest.BayerGBRGToARGB_Any (4403 ms)
4374 - [       OK ] libyuvTest.BayerGRBGToARGB_Any (4374 ms)
4368 - [       OK ] libyuvTest.BayerRGGBToARGB_Any (4368 ms)
4320 - [       OK ] libyuvTest.BayerBGGRToARGB_Any (4320 ms)
4250 - [       OK ] libyuvTest.I420ToARGB1555_Any (4250 ms)
4173 - [       OK ] libyuvTest.RAWToI420_Any (4173 ms)
4097 - [       OK ] libyuvTest.RGB24ToI420_Any (4097 ms)
4083 - [       OK ] libyuvTest.I420ToRGB565_Any (4083 ms)
4057 - [       OK ] libyuvTest.I444ToARGB_Any (4057 ms)
4001 - [       OK ] libyuvTest.UYVYToARGB_Any (4001 ms)

Original comment by fbarch...@chromium.org on 10 Nov 2012 at 10:40

GoogleCodeExporter commented 9 years ago
r484 improves top 3
I420ToI444_Any (8832 ms)
BayerRGGBToI420_Any (6081 ms)
BayerGBRGToI420_Any (6051 ms)
BayerBGGRToI420_Any (5954 ms)
BayerGRBGToI420_Any (5942 ms)
NV21ToRGB565_Any (4421 ms)
BayerGBRGToARGB_Any (4416 ms)
NV12ToRGB565_Any (4410 ms)
BayerRGGBToARGB_Any (4371 ms)
BayerGRBGToARGB_Any (4360 ms)
BayerBGGRToARGB_Any (4325 ms)
I420ToARGB1555_Any (4260 ms)
RAWToI420_Any (4234 ms)
RGB24ToI420_Any (4092 ms)
I420ToRGB565_Any (4090 ms)
I444ToARGB_Any (4059 ms)
YUY2ToARGB_Any (4005 ms)
UYVYToARGB_Any (4002 ms)

Original comment by fbarch...@chromium.org on 13 Nov 2012 at 5:31

GoogleCodeExporter commented 9 years ago
I420ToI444_Any is a 2x upsampler with subpixel accurate center, so the pixels 
end up blending 25/75%, which is not a specialized case.
On Intel, specializing this case improves performance from
I420ToI444_Any (16083 ms)
to
I420ToI444_Any (4722 ms)

Original comment by fbarch...@chromium.org on 13 Nov 2012 at 7:36

GoogleCodeExporter commented 9 years ago
r488 improves scaling, mirroring, copy
I420ToI444_Any (7281 ms)
BayerGBRGToI420_Any (4759 ms)
BayerRGGBToI420_Any (4730 ms)
BayerBGGRToI420_Any (4680 ms)
BayerGRBGToI420_Any (4665 ms)
RAWToI420_Any (3845 ms)
RGB24ToI420_Any (3810 ms)
BayerGBRGToARGB_Any (3743 ms)
BayerRGGBToARGB_Any (3722 ms)
BayerGRBGToARGB_Any (3710 ms)
NV21ToRGB565_Any (3690 ms)
BayerBGGRToARGB_Any (3679 ms)
NV12ToRGB565_Any (3673 ms)
ARGBToI444_Any (3574 ms)
I420ToARGB1555_Any (3544 ms)
I444ToARGB_Any (3491 ms)
I420ToRGB565_Any (3400 ms)
UYVYToARGB_Any (3364 ms)
ARGBToUYVY_Any (3325 ms)
ARGBToYUY2_Any (3312 ms)
YUY2ToARGB_Any (3305 ms)

Original comment by fbarch...@google.com on 14 Nov 2012 at 2:35

GoogleCodeExporter commented 9 years ago
Aside from conversions, some tests are slower than 5 ms.
60754 - [       OK ] libyuvTest.ARGBRotate270 (60754 ms)
59909 - [       OK ] libyuvTest.ARGBRotate90 (59909 ms)
56615 - [       OK ] libyuvTest.ARGBScaleTo1366x768 (56615 ms)
50779 - [       OK ] libyuvTest.ARGBRotate90_Odd (50779 ms)
48677 - [       OK ] libyuvTest.ARGBRotate270_Odd (48677 ms)
39790 - [       OK ] libyuvTest.ScaleTo1366x768 (39790 ms)
37087 - [       OK ] libyuvTest.BenchmarkSsim_Opt (37087 ms)
31122 - [       OK ] libyuvTest.ARGBScaleDownBy34 (31122 ms)
27934 - [       OK ] libyuvTest.ARGBScaleTo853x480 (27934 ms)
22180 - [       OK ] libyuvTest.TestARGBColorMatrix (22180 ms)
18670 - [       OK ] libyuvTest.ScaleTo853x480 (18670 ms)
13188 - [       OK ] libyuvTest.ScaleDownBy34 (13188 ms)
11445 - [       OK ] libyuvTest.ARGBScaleDownBy38 (11445 ms)
10472 - [       OK ] libyuvTest.TestARGBSepia (10472 ms)
7824 - [       OK ] libyuvTest.I420ToI444_Any (7824 ms)
7771 - [       OK ] libyuvTest.I420ToI444_Invert (7771 ms)
7765 - [       OK ] libyuvTest.I420ToI444_Unaligned (7765 ms)
7743 - [       OK ] libyuvTest.I420ToI444_Opt (7743 ms)
7542 - [       OK ] libyuvTest.ARGBScaleDownBy2 (7542 ms)
6781 - [       OK ] libyuvTest.TestARGBQuantize (6781 ms)
6468 - [       OK ] libyuvTest.ScaleDownBy2 (6468 ms)
6381 - [       OK ] libyuvTest.YToARGB_Any (6381 ms)
5929 - [       OK ] libyuvTest.TestShade (5929 ms)
5928 - [       OK ] libyuvTest.ARGBRotate180_Odd (5928 ms)
5530 - [       OK ] libyuvTest.ScaleDownBy38 (5530 ms)
5259 - [       OK ] libyuvTest.TestAffine (5259 ms)
5210 - [       OK ] libyuvTest.ARGBRotate180 (5210 ms)
5069 - [       OK ] libyuvTest.BayerBGGRToI420_Invert (5069 ms)
5056 - [       OK ] libyuvTest.TestARGBColorTable (5056 ms)
5052 - [       OK ] libyuvTest.TestAttenuate (5052 ms)
5018 - [       OK ] libyuvTest.BayerBGGRToI420_Opt (5018 ms)
4971 - [       OK ] libyuvTest.BayerRGGBToI420_Any (4971 ms)
4969 - [       OK ] libyuvTest.BayerGBRGToI420_Any (4969 ms)
4924 - [       OK ] libyuvTest.BayerBGGRToI420_Any (4924 ms)
4906 - [       OK ] libyuvTest.BayerGRBGToI420_Any (4906 ms)

Original comment by fbarch...@google.com on 20 Nov 2012 at 11:25

GoogleCodeExporter commented 9 years ago
r504 adds a YToARGB_Any test which is unoptimized.
I420ToI444 optimized interpolation for 1/4 and 3/4 which is used on 2x upsample 
rows.
7314 - [       OK ] libyuvTest.I420ToI444_Any (7314 ms)
6006 - [       OK ] libyuvTest.YToARGB_Any (6006 ms)
4804 - [       OK ] libyuvTest.BayerRGGBToI420_Any (4804 ms)
4767 - [       OK ] libyuvTest.BayerGBRGToI420_Any (4767 ms)
4734 - [       OK ] libyuvTest.BayerBGGRToI420_Any (4734 ms)
4716 - [       OK ] libyuvTest.BayerGRBGToI420_Any (4716 ms)
4581 - [       OK ] libyuvTest.ABGRToARGB_Any (4581 ms)
4539 - [       OK ] libyuvTest.ARGBToRGBA_Any (4539 ms)
4519 - [       OK ] libyuvTest.ARGBToBGRA_Any (4519 ms)
4518 - [       OK ] libyuvTest.ARGBToABGR_Any (4518 ms)
4496 - [       OK ] libyuvTest.BGRAToARGB_Any (4496 ms)
4219 - [       OK ] libyuvTest.ARGBInterpolate255_Any (4219 ms)
4068 - [       OK ] libyuvTest.RGBAToARGB_Any (4068 ms)

Original comment by fbarch...@chromium.org on 26 Nov 2012 at 11:52

GoogleCodeExporter commented 9 years ago
r506 fixed YToARGB and I420ToI444.  5 ms achieved.

chronos@localhost $ ./runyuv10 Any
sudo LIBYUV_REPEAT=1000 nice --5 ./libyuv_unittest --gtest_filter=*Any | sed 
's/\(.*(\)\([0-9]*\)\( ms)\)/\2 - \1\2\3/g' | sort -rn | grep ms
4724 - [       OK ] libyuvTest.BayerRGGBToI420_Any (4724 ms)
4682 - [       OK ] libyuvTest.BayerGBRGToI420_Any (4682 ms)
4680 - [       OK ] libyuvTest.BayerGRBGToI420_Any (4680 ms)
4675 - [       OK ] libyuvTest.BayerBGGRToI420_Any (4675 ms)
4516 - [       OK ] libyuvTest.ARGBToABGR_Any (4516 ms)
4510 - [       OK ] libyuvTest.BGRAToARGB_Any (4510 ms)
4499 - [       OK ] libyuvTest.ARGBToRGBA_Any (4499 ms)
4495 - [       OK ] libyuvTest.ARGBToBGRA_Any (4495 ms)
4495 - [       OK ] libyuvTest.ABGRToARGB_Any (4495 ms)
4379 - [       OK ] libyuvTest.ARGBInterpolate255_Any (4379 ms)
4166 - [       OK ] libyuvTest.I420ToI444_Any (4166 ms)
4095 - [       OK ] libyuvTest.RGBAToARGB_Any (4095 ms)
4038 - [       OK ] libyuvTest.ARGBInterpolate64_Any (4038 ms)
4019 - [       OK ] libyuvTest.ARGBInterpolate192_Any (4019 ms)
3982 - [       OK ] libyuvTest.ARGBInterpolate128_Any (3982 ms)
3933 - [       OK ] libyuvTest.ARGBBlend_Any (3933 ms)
3847 - [       OK ] libyuvTest.RAWToI420_Any (3847 ms)
3830 - [       OK ] libyuvTest.ARGB1555ToI420_Any (3830 ms)
3820 - [       OK ] libyuvTest.RGB24ToI420_Any (3820 ms)
3793 - [       OK ] libyuvTest.BayerGBRGToARGB_Any (3793 ms)
3788 - [       OK ] libyuvTest.BayerGRBGToARGB_Any (3788 ms)
3781 - [       OK ] libyuvTest.BayerRGGBToARGB_Any (3781 ms)
3699 - [       OK ] libyuvTest.BayerBGGRToARGB_Any (3699 ms)
3677 - [       OK ] libyuvTest.NV21ToRGB565_Any (3677 ms)
3674 - [       OK ] libyuvTest.NV12ToRGB565_Any (3674 ms)
3651 - [       OK ] libyuvTest.ARGBToI444_Any (3651 ms)
3545 - [       OK ] libyuvTest.I420ToARGB1555_Any (3545 ms)
3517 - [       OK ] libyuvTest.UYVYToARGB_Any (3517 ms)
3491 - [       OK ] libyuvTest.I444ToARGB_Any (3491 ms)
3490 - [       OK ] libyuvTest.ARGBToUYVY_Any (3490 ms)
3457 - [       OK ] libyuvTest.ARGBToYUY2_Any (3457 ms)
3412 - [       OK ] libyuvTest.YUY2ToARGB_Any (3412 ms)
3410 - [       OK ] libyuvTest.I420ToRGB565_Any (3410 ms)
3309 - [       OK ] libyuvTest.I420ToBayerBGGR_Any (3309 ms)
3307 - [       OK ] libyuvTest.I420ToBayerRGGB_Any (3307 ms)
3306 - [       OK ] libyuvTest.I420ToBayerGBRG_Any (3306 ms)
3304 - [       OK ] libyuvTest.I420ToBayerGRBG_Any (3304 ms)
3274 - [       OK ] libyuvTest.I411ToARGB_Any (3274 ms)
3238 - [       OK ] libyuvTest.ARGBToI422_Any (3238 ms)
3072 - [       OK ] libyuvTest.ARGBInterpolate0_Any (3072 ms)
3069 - [       OK ] libyuvTest.NV21ToARGB_Any (3069 ms)
3062 - [       OK ] libyuvTest.NV12ToARGB_Any (3062 ms)
3027 - [       OK ] libyuvTest.I422ToRGBA_Any (3027 ms)
3016 - [       OK ] libyuvTest.I422ToBGRA_Any (3016 ms)
3008 - [       OK ] libyuvTest.RGB565ToI420_Any (3008 ms)
3007 - [       OK ] libyuvTest.I422ToARGB_Any (3007 ms)
3004 - [       OK ] libyuvTest.I422ToABGR_Any (3004 ms)
2965 - [       OK ] libyuvTest.V210ToI420_Any (2965 ms)
2931 - [       OK ] libyuvTest.I420ToARGB4444_Any (2931 ms)
2921 - [       OK ] libyuvTest.ARGBToI411_Any (2921 ms)
2908 - [       OK ] libyuvTest.I420ToRAW_Any (2908 ms)
2906 - [       OK ] libyuvTest.I420ToBGRA_Any (2906 ms)
2900 - [       OK ] libyuvTest.I420ToARGB_Any (2900 ms)
2895 - [       OK ] libyuvTest.I420ToRGBA_Any (2895 ms)
2895 - [       OK ] libyuvTest.I420ToABGR_Any (2895 ms)
2875 - [       OK ] libyuvTest.ABGRToI420_Any (2875 ms)
2855 - [       OK ] libyuvTest.ARGBToNV21_Any (2855 ms)
2855 - [       OK ] libyuvTest.ARGBToNV12_Any (2855 ms)
2854 - [       OK ] libyuvTest.ARGB4444ToI420_Any (2854 ms)
2849 - [       OK ] libyuvTest.ARGBToI420_Any (2849 ms)
2820 - [       OK ] libyuvTest.RGBAToI420_Any (2820 ms)
2818 - [       OK ] libyuvTest.BGRAToI420_Any (2818 ms)
2817 - [       OK ] libyuvTest.ARGBToARGB1555_Any (2817 ms)
2803 - [       OK ] libyuvTest.I420ToRGB24_Any (2803 ms)
2732 - [       OK ] libyuvTest.I444ToI420_Any (2732 ms)
2670 - [       OK ] libyuvTest.ARGBAttenuate_Any (2670 ms)
2600 - [       OK ] libyuvTest.ARGBToRGB565_Any (2600 ms)
2489 - [       OK ] libyuvTest.ARGBToARGB4444_Any (2489 ms)
2432 - [       OK ] libyuvTest.ARGBToRAW_Any (2432 ms)
2426 - [       OK ] libyuvTest.YToARGB_Any (2426 ms)
2319 - [       OK ] libyuvTest.ARGBToRGB24_Any (2319 ms)
2293 - [       OK ] libyuvTest.ARGBToI400_Any (2293 ms)
2205 - [       OK ] libyuvTest.ARGBToARGBMirror_Any (2205 ms)
2156 - [       OK ] libyuvTest.I420ToI420Mirror_Any (2156 ms)
2139 - [       OK ] libyuvTest.ARGBToARGB_Any (2139 ms)
2101 - [       OK ] libyuvTest.I422ToI420_Any (2101 ms)
1876 - [       OK ] libyuvTest.RAWToARGB_Any (1876 ms)
1851 - [       OK ] libyuvTest.RGB24ToARGB_Any (1851 ms)
1764 - [       OK ] libyuvTest.ARGBToBayerBGGR_Any (1764 ms)
1756 - [       OK ] libyuvTest.ARGBToBayerRGGB_Any (1756 ms)
1756 - [       OK ] libyuvTest.ARGBToBayerGRBG_Any (1756 ms)
1751 - [       OK ] libyuvTest.ARGBToBayerGBRG_Any (1751 ms)
1710 - [       OK ] libyuvTest.UYVYToI422_Any (1710 ms)
1700 - [       OK ] libyuvTest.YUY2ToI422_Any (1700 ms)
1697 - [       OK ] libyuvTest.ARGB1555ToARGB_Any (1697 ms)
1691 - [       OK ] libyuvTest.I411ToI420_Any (1691 ms)
1680 - [       OK ] libyuvTest.I420ToI411_Any (1680 ms)
1625 - [       OK ] libyuvTest.RGB565ToARGB_Any (1625 ms)
1609 - [       OK ] libyuvTest.ARGB4444ToARGB_Any (1609 ms)
1457 - [       OK ] libyuvTest.YUY2ToI420_Any (1457 ms)
1457 - [       OK ] libyuvTest.UYVYToI420_Any (1457 ms)
1411 - [       OK ] libyuvTest.I422ToUYVY_Any (1411 ms)
1370 - [       OK ] libyuvTest.I400ToI400Mirror_Any (1370 ms)
1319 - [       OK ] libyuvTest.I422ToYUY2_Any (1319 ms)
1232 - [       OK ] libyuvTest.I400ToARGB_Any (1232 ms)
1199 - [       OK ] libyuvTest.I420ToI422_Any (1199 ms)
1143 - [       OK ] libyuvTest.I420ToYUY2_Any (1143 ms)
1134 - [       OK ] libyuvTest.I420ToUYVY_Any (1134 ms)
1071 - [       OK ] libyuvTest.I420ToI420_Any (1071 ms)
1058 - [       OK ] libyuvTest.NV21ToI420_Any (1058 ms)
1058 - [       OK ] libyuvTest.NV12ToI420_Any (1058 ms)
997 - [       OK ] libyuvTest.I420ToNV21_Any (997 ms)
987 - [       OK ] libyuvTest.I420ToNV12_Any (987 ms)
710 - [       OK ] libyuvTest.I400ToI420_Any (710 ms)
674 - [       OK ] libyuvTest.I420ToI400_Any (674 ms)
572 - [       OK ] libyuvTest.I400ToI400_Any (572 ms)
[==========] 106 tests from 1 test case ran. (294536 ms total)

Original comment by fbarch...@chromium.org on 28 Nov 2012 at 8:12