lu-zero / libvpx

Local libvpx changes (POWER8 Altivec/VSX support)
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Update rtcd.pl #1

Closed lu-zero closed 7 years ago

lu-zero commented 8 years ago

Make rtcd.pl aware of POWER8 Altivec/VSX.

rafaeldelucena commented 8 years ago

Hello @lu-zero,

I have created a proof-of-concept for this issue, for now the POC already detects when is a powerpc, if have altivec and vsx then apply the gcc flags, also detects at run-time if the Linux has altivec support.

So far so good

I was looking at the table AltiVec (GNU/GCC 4.8.2) data types from this wiki, and the GNU/Gcc flags will differ for -maltivec, -mvsx and -mcpu=powerpc8

So the PR must support all the extended datatypes from powerpc8?

Also from the wiki:

If the ISA 2.07 additions to the vector/scalar (power8-vector) instruction set are available, the following additional functions are available for both 32-bit and 64-bit targets.

32-bit support is needed?

Best regards

lu-zero commented 8 years ago

Personally I just care about VSX and I'll write the support only for VSX since it has current hardware for it and it makes the code much simpler regarding loads and store.

This bug list is more or less my todolist on what to do.

I doubt the extended datatypes would be be useful in vpx.

lu-zero commented 8 years ago

With that said the poc looks promising, will you complete it soon?

rafaeldelucena commented 8 years ago

For sure!

I just need to test with some vsx specific code and with all things working well I will create a PR for this issue.

lu-zero commented 8 years ago

I had something halfway since I was planning to start implementing next week, but your version seems much more complete already.

rafaeldelucena commented 8 years ago

My plan is to finish it before the end of the weekend

rafaeldelucena commented 8 years ago

Hello again,

I removed the detection for AltiVec and did not add the specific flags for Power8, but I can do it if you think it is necessary.

Below is an example of how to add VSX functions in the build system, in case I have tested for vpx_dsp.

diff --git a/vpx_dsp/vpx_dsp.mk b/vpx_dsp/vpx_dsp.mk
index 9b62520..f343aca 100644
--- a/vpx_dsp/vpx_dsp.mk
+++ b/vpx_dsp/vpx_dsp.mk
@@ -297,6 +297,8 @@ DSP_SRCS-$(HAVE_SSE2)   += x86/sad4d_sse2.asm
 DSP_SRCS-$(HAVE_SSE2)   += x86/sad_sse2.asm
 DSP_SRCS-$(HAVE_SSE2)   += x86/subtract_sse2.asm

+DSP_SRCS-$(HAVE_VSX)   += powerpc/sum_squares_vsx.c
+
 ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes)
 DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad4d_sse2.asm
 DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad_sse2.asm
diff --git a/vpx_dsp/vpx_dsp_rtcd_defs.pl b/vpx_dsp/vpx_dsp_rtcd_defs.pl
index fa62941..4115b9a 100644
--- a/vpx_dsp/vpx_dsp_rtcd_defs.pl
+++ b/vpx_dsp/vpx_dsp_rtcd_defs.pl
@@ -1014,7 +1014,7 @@ add_proto qw/void vpx_sad4x4x4d/, "const uint8_t *src_ptr, int src_stride, const
 specialize qw/vpx_sad4x4x4d msa sse2/;

 add_proto qw/uint64_t vpx_sum_squares_2d_i16/, "const int16_t *src, int stride, int size";
-specialize qw/vpx_sum_squares_2d_i16 sse2/;
+specialize qw/vpx_sum_squares_2d_i16 sse2 vsx/;

 #
 # Structured Similarity (SSIM)

It is the same pattern to other parts of the project, I await your reply.

Best regards

lu-zero commented 7 years ago

Just to make sure, did you do all the CLA dance with upstream?

rafaeldelucena commented 7 years ago

Doin it :)

rafaeldelucena commented 7 years ago

Pushing to upstream https://chromium-review.googlesource.com/c/450877/

rafaeldelucena commented 7 years ago

Hello @edelsohn, I've some questions from members of WebM community:

are there any cases of processors without vsx where you would want this to run? if not, it would be useful to set this up like x86_64 with sse2, where the sse2 function replaces the c version, allowing the linker to strip out the c version.

Also,

Do you anticipate options besides vsx? If not, you can leave out most of this. It is for cases where you had, for example, extension1 and extension2, so that --disable-extension1 also disabled extension2. Since you have only vsx, that is not necessary.

This patch adds support for Power8 VSX instructions, Is there an intention to support older versions?

Kind regards

edelsohn commented 7 years ago

Hi, @rafaeldelucena

What are the options? I don't know the details of the x86-64 setup with SSE2. Does it completely fail if SSE2 is not present?

The focus is systems that assume a minimum ISA with VSX. I don't expect the code to dynamically test for the feature at runtime. I expected that the library would detect or be configured for VSX when it is built. I sort of assumed that it would fall back to the current, simple C code if not configured for VSX, as opposed to completely failing to build. But a library built with the assumption of VSX is not expected to run on a hardware at an earlier ISA level.

For options other than VSX, I don't expect support for the original Altivec/VMX. Power9 now is public and adds a few more VSX instructions. I don't know if those could be useful for VPX. I would like the library to be able to leverage any future ISA improvements, when useful.

Does that answer your question? I don't understand exactly what you are asking.

rafaeldelucena commented 7 years ago

@edelsohn

Does that answer your question? I don't understand exactly what you are asking.

Yes, this answered

What are the options? I don't know the details of the x86-64 setup with SSE2. Does it completely fail if SSE2 is not present?

If for any reason doesn't have the VSX support will build with the C version.

The focus is systems that assume a minimum ISA with VSX. I don't expect the code to dynamically test for the feature at runtime.

Ok, thank you for the reply

rafaeldelucena commented 7 years ago

@edelsohn

It's done :)

https://chromium.googlesource.com/webm/libvpx/+/51289302ab02d81c17d3f15bbfb9a22eef4a36c1

edelsohn commented 7 years ago

Hi, Rafael

Excellent! Glad to see the initial support committed. Good work.

Thanks, David

On Wed, Mar 8, 2017 at 6:13 PM, Rafael de Lucena Valle < notifications@github.com> wrote:

@edelsohn https://github.com/edelsohn

It's done :)

https://chromium.googlesource.com/webm/libvpx/+/ 51289302ab02d81c17d3f15bbfb9a22eef4a36c1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lu-zero/libvpx/issues/1#issuecomment-285201160, or mute the thread https://github.com/notifications/unsubscribe-auth/AAowNP8aNkQciWbIJGjCTUKhm0J5SX_Zks5rjzY0gaJpZM4KZ5hI .

rafaeldelucena commented 7 years ago

@edelsohn

They (WebM community members) have some questions about what versions of powerpc (ppc64, ppc64le and ppc (non-64)) they must support, do you mind if I add you on the discussion?

edelsohn commented 7 years ago

I already had subscribed to the patch review and responded on the thread.

rafaeldelucena commented 7 years ago

Thanks! :)

edelsohn commented 7 years ago

Google and the WebM community have released some data and high-level benchmarks for VP9 (and H.264).

http://downloads.webmproject.org/yt_testclips

The USAGE file explains the high-level measurements, but doesn't provide any new unit benchmarks. Profiling should highlight the critical paths.

A previous report provided a profile stack:

18.25% ffmpeg.llvm ffmpeg.llvm [.] convolve_horiz 17.70% ffmpeg.llvm ffmpeg.llvm [.] convolve_vert 7.78% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad64x64x4d_c
4.99% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad32x32x4d_c 4.59% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad16x16x4d_c
2.98% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance64x64_c 2.97% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance32x32_c
2.88% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance16x16_c
2.76% ffmpeg.llvm ffmpeg.llvm [.] vpx_sad8x8_c
2.73% ffmpeg.llvm ffmpeg.llvm [.] vpx_variance32x32_c
2.70% ffmpeg.llvm ffmpeg.llvm [.] vpx_fdct32
2.12% ffmpeg.llvm ffmpeg.llvm [.] vpx_quantize_b_c
1.76% ffmpeg.llvm ffmpeg.llvm [.] vpx_variance16x16_c
1.53% ffmpeg.llvm ffmpeg.llvm [.] vpx_variance8x8_c
1.31% ffmpeg.llvm ffmpeg.llvm [.] vpx_fdct32x32_rd_c 1.16% ffmpeg.llvm ffmpeg.llvm [.] vpx_sub_pixel_variance8x8_c
1.14% ffmpeg.llvm ffmpeg.llvm [.] vpx_fdct16x16_c
1.08% ffmpeg.llvm ffmpeg.llvm [.] vpx_subtract_block_c

lu-zero commented 7 years ago

merged upstream eventually.