keeleysam / tenfourfox

Automatically exported from code.google.com/p/tenfourfox
0 stars 0 forks source link

Install AltiVec libvpx and modify build system (G4, G5 only) #28

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
In issue 26 a user brought up poor WebM speed. This is undoubtedly due to issue 
7, but we should look into writing a PPC-specific decoder path along the lines 
of the ARM and x86 code that already exists to at least speed up the decoding 
step even if rendering still sucks.

Original issue reported on code.google.com by classi...@floodgap.com on 5 Feb 2011 at 5:22

GoogleCodeExporter commented 9 years ago
Moving priority up as this is a desired feature for 5.

Original comment by classi...@floodgap.com on 22 Mar 2011 at 1:09

GoogleCodeExporter commented 9 years ago
So I grabbed libvpx 0.9.5 to look at it in more detail, and guess what I found: 
there is already Altivec code available, Mozilla just isn't using it!

So here's our steps:

- G3 has to use non-AltiVec. sorry, guys. Need to hack build system so G3 only 
builds the unoptimized C version.

- Copy in the VMX decoder and set the build system so that G4, G5 will build 
AltiVec.

This is such a no brainer I'm considering doing this for 4.

Original comment by classi...@floodgap.com on 23 Mar 2011 at 5:33

GoogleCodeExporter commented 9 years ago
Initial steps: I'll try this later.

Benchmark a WebM video, like say Big Buck Bunny, and get an initial fps

Copy libvpx-v0.9.5/vp8/common/ppc/* to media/libvpx/vp8/common/ppc/*

Add a new option --enable-tenfourfox-altivec to configure (along the lines of 
how VPX_X86_ASM gets set) which sets VPX_VMX_ASM

Modify media/libvpx/Makefile.in to add in the CSRCS and asms for PPC VMX in the 
same way

See if we made any difference

Consider translating the rest of the decoder files if we didn't (there appear 
to only be VMX versions of common/)

Original comment by classi...@floodgap.com on 23 Mar 2011 at 5:44

GoogleCodeExporter commented 9 years ago
We're going to put this in the next 4 beta. This is too good not to.

Original comment by classi...@floodgap.com on 23 Mar 2011 at 2:02

GoogleCodeExporter commented 9 years ago
Gonna be a bigger job as Google provides the Altivec code, but no headers. ( 
http://code.google.com/p/webm/issues/detail?id=223 ) They don't even guarantee 
it works.

Created stub configs vpx_config_tenfourfox_altivec.c 
vpx_config_tenfourfox_altivec.h tonight.

The .asm files will be compiled with gcc, so they need to be renamed to .s. The 
macros also need to be unrolled beforehand because as 1.38 in Tiger is freaking 
stupid. We'll then add pieces to compile them.

We will need to create .h files in vp8/common/ppc for each of the .asm files 
(they are not part of the source already). We should model them on the ARM 
files. We support copy, filter (and filter_bilinear), idctllm, 
loopfilter/loopfilter_filters and recon. We have no postproc.

These then need to be hooked up into idct.h, recon.h, ... ? (follow the ARM 
model again).

Next steps as we reach them.

Original comment by classi...@floodgap.com on 24 Mar 2011 at 6:10

GoogleCodeExporter commented 9 years ago
I'm not an assembly wizard (just learning ;) but I believe you don't need 
header files here if assembly declares external symbols (as well as in C you 
can use symbols from other object files even if the are not declared in headers)

For example, copy_altivec.asm defines copy_mem16x16_ppc function which is 
exported in systemdependent.c as "extern copy_mem_block_function 
*vp8_copy_mem16x16"

Original comment by annu...@gmail.com on 24 Mar 2011 at 4:09

GoogleCodeExporter commented 9 years ago
Also, if assembly file is not  "*.s" it could be compiled with GCC using -x 
assembler

Original comment by annu...@gmail.com on 24 Mar 2011 at 4:12

GoogleCodeExporter commented 9 years ago
I'm not sure if it's some irregularity of their build system, but they seem to 
want these and this would at least be congruent with the existing source.

I'm going to dump the .asms into .s anyway, because I have to preprocess them 
for the stupid Apple as that comes with Tiger. I'm trotting out a little Perl 
script for that.

Original comment by classi...@floodgap.com on 24 Mar 2011 at 5:10

GoogleCodeExporter commented 9 years ago
Are you going to fork libvpx?

Original comment by annu...@gmail.com on 24 Mar 2011 at 5:25

GoogleCodeExporter commented 9 years ago
That is not my current intention, although when we lose source parity it may be 
necessary.

Original comment by classi...@floodgap.com on 25 Mar 2011 at 3:28

GoogleCodeExporter commented 9 years ago

Original comment by classi...@floodgap.com on 3 Apr 2011 at 10:25

GoogleCodeExporter commented 9 years ago
Progress so far.

- Stole a similar pre-processor which sort of works. This was good enough to 
convert idctllm_altivec.asm, recon_altivec.asm, filter_altivec.asm and 
filter_bilinear_altivec.asm (with some hand-massaging).

- Altered vp8/common/*.h files to point to the newly exposed code.

- Altered media/libvpx/Makefile.in to use gcc as the PPC assembler, and include 
asm source for our working files. We also have to add -read_only_relocs 
suppress to the mozconfigs (but we won't for G3, which is convenient, because 
if we do it wrong and accidentally link the G4 code into the G3 version, then 
ld will balk and we'll be alerted).

This only-partially-Altivecized player is now good enough to play over half of 
the 1-minute Big Buck Bunny webm on the G5 in reduced mode. It still starts to 
seize at the part when zooming in on his burrow, but is still markedly faster 
than the unoptimized version which stutters almost immediately. Our goal is to 
get it to play fully in the G5 on reduced mode, which should correspond well 
with a high-end G4. Obviously it has no problem playing on the G5 in automatic 
or highest.

Still to do:

- Convert loopfilter_filters_altivec.asm (and maybe loopfilter_altivec.c). This 
is the biggest of the assembly files and thus likely represents some heavy-duty 
oomph.

- Modify /configure and the 7400/7450/G5 build configs to emit flags to enable 
AltiVec acceleration, and wrap all the changes in libvpx so that the G3 build 
config still builds the regular C version.

Original comment by classi...@floodgap.com on 11 Apr 2011 at 4:02

GoogleCodeExporter commented 9 years ago
Wow, that was easy. Fixed a glitch in the gas-preprocessor and now even 
loopfilter_filters parses and builds. It's working! We have AltiVec WebM!

We seem to be missing AltiVec equivalents for almost all of the idct 
components, but this is good enough for now. I can probably extrapolate the 
rest of the pieces I need to build the rest of the idcts from the one we do 
have.

Tomorrow I'll finish the build system up so that G3 still builds the C version. 
It's a big improvement. Some videos still stutter, but a lot more plays.

Original comment by classi...@floodgap.com on 11 Apr 2011 at 5:14

GoogleCodeExporter commented 9 years ago
Essentially this is working. More changes in followup issues.

Original comment by classi...@floodgap.com on 16 Apr 2011 at 10:50

GoogleCodeExporter commented 9 years ago

Original comment by Tobias.N...@gmail.com on 16 Jul 2012 at 10:39