Closed GoogleCodeExporter closed 9 years ago
Moving priority up as this is a desired feature for 5.
Original comment by classi...@floodgap.com
on 22 Mar 2011 at 1:09
So I grabbed libvpx 0.9.5 to look at it in more detail, and guess what I found:
there is already Altivec code available, Mozilla just isn't using it!
So here's our steps:
- G3 has to use non-AltiVec. sorry, guys. Need to hack build system so G3 only
builds the unoptimized C version.
- Copy in the VMX decoder and set the build system so that G4, G5 will build
AltiVec.
This is such a no brainer I'm considering doing this for 4.
Original comment by classi...@floodgap.com
on 23 Mar 2011 at 5:33
Initial steps: I'll try this later.
Benchmark a WebM video, like say Big Buck Bunny, and get an initial fps
Copy libvpx-v0.9.5/vp8/common/ppc/* to media/libvpx/vp8/common/ppc/*
Add a new option --enable-tenfourfox-altivec to configure (along the lines of
how VPX_X86_ASM gets set) which sets VPX_VMX_ASM
Modify media/libvpx/Makefile.in to add in the CSRCS and asms for PPC VMX in the
same way
See if we made any difference
Consider translating the rest of the decoder files if we didn't (there appear
to only be VMX versions of common/)
Original comment by classi...@floodgap.com
on 23 Mar 2011 at 5:44
We're going to put this in the next 4 beta. This is too good not to.
Original comment by classi...@floodgap.com
on 23 Mar 2011 at 2:02
Gonna be a bigger job as Google provides the Altivec code, but no headers. (
http://code.google.com/p/webm/issues/detail?id=223 ) They don't even guarantee
it works.
Created stub configs vpx_config_tenfourfox_altivec.c
vpx_config_tenfourfox_altivec.h tonight.
The .asm files will be compiled with gcc, so they need to be renamed to .s. The
macros also need to be unrolled beforehand because as 1.38 in Tiger is freaking
stupid. We'll then add pieces to compile them.
We will need to create .h files in vp8/common/ppc for each of the .asm files
(they are not part of the source already). We should model them on the ARM
files. We support copy, filter (and filter_bilinear), idctllm,
loopfilter/loopfilter_filters and recon. We have no postproc.
These then need to be hooked up into idct.h, recon.h, ... ? (follow the ARM
model again).
Next steps as we reach them.
Original comment by classi...@floodgap.com
on 24 Mar 2011 at 6:10
I'm not an assembly wizard (just learning ;) but I believe you don't need
header files here if assembly declares external symbols (as well as in C you
can use symbols from other object files even if the are not declared in headers)
For example, copy_altivec.asm defines copy_mem16x16_ppc function which is
exported in systemdependent.c as "extern copy_mem_block_function
*vp8_copy_mem16x16"
Original comment by annu...@gmail.com
on 24 Mar 2011 at 4:09
Also, if assembly file is not "*.s" it could be compiled with GCC using -x
assembler
Original comment by annu...@gmail.com
on 24 Mar 2011 at 4:12
I'm not sure if it's some irregularity of their build system, but they seem to
want these and this would at least be congruent with the existing source.
I'm going to dump the .asms into .s anyway, because I have to preprocess them
for the stupid Apple as that comes with Tiger. I'm trotting out a little Perl
script for that.
Original comment by classi...@floodgap.com
on 24 Mar 2011 at 5:10
Are you going to fork libvpx?
Original comment by annu...@gmail.com
on 24 Mar 2011 at 5:25
That is not my current intention, although when we lose source parity it may be
necessary.
Original comment by classi...@floodgap.com
on 25 Mar 2011 at 3:28
Original comment by classi...@floodgap.com
on 3 Apr 2011 at 10:25
Progress so far.
- Stole a similar pre-processor which sort of works. This was good enough to
convert idctllm_altivec.asm, recon_altivec.asm, filter_altivec.asm and
filter_bilinear_altivec.asm (with some hand-massaging).
- Altered vp8/common/*.h files to point to the newly exposed code.
- Altered media/libvpx/Makefile.in to use gcc as the PPC assembler, and include
asm source for our working files. We also have to add -read_only_relocs
suppress to the mozconfigs (but we won't for G3, which is convenient, because
if we do it wrong and accidentally link the G4 code into the G3 version, then
ld will balk and we'll be alerted).
This only-partially-Altivecized player is now good enough to play over half of
the 1-minute Big Buck Bunny webm on the G5 in reduced mode. It still starts to
seize at the part when zooming in on his burrow, but is still markedly faster
than the unoptimized version which stutters almost immediately. Our goal is to
get it to play fully in the G5 on reduced mode, which should correspond well
with a high-end G4. Obviously it has no problem playing on the G5 in automatic
or highest.
Still to do:
- Convert loopfilter_filters_altivec.asm (and maybe loopfilter_altivec.c). This
is the biggest of the assembly files and thus likely represents some heavy-duty
oomph.
- Modify /configure and the 7400/7450/G5 build configs to emit flags to enable
AltiVec acceleration, and wrap all the changes in libvpx so that the G3 build
config still builds the regular C version.
Original comment by classi...@floodgap.com
on 11 Apr 2011 at 4:02
Wow, that was easy. Fixed a glitch in the gas-preprocessor and now even
loopfilter_filters parses and builds. It's working! We have AltiVec WebM!
We seem to be missing AltiVec equivalents for almost all of the idct
components, but this is good enough for now. I can probably extrapolate the
rest of the pieces I need to build the rest of the idcts from the one we do
have.
Tomorrow I'll finish the build system up so that G3 still builds the C version.
It's a big improvement. Some videos still stutter, but a lot more plays.
Original comment by classi...@floodgap.com
on 11 Apr 2011 at 5:14
Essentially this is working. More changes in followup issues.
Original comment by classi...@floodgap.com
on 16 Apr 2011 at 10:50
Original comment by Tobias.N...@gmail.com
on 16 Jul 2012 at 10:39
Original issue reported on code.google.com by
classi...@floodgap.com
on 5 Feb 2011 at 5:22