Open GoogleCodeExporter opened 9 years ago
Original comment by classi...@floodgap.com
on 10 Jul 2011 at 11:30
#3 and #4 are now issue 75 and issue 76.
Original comment by classi...@floodgap.com
on 11 Jul 2011 at 1:01
Original comment by classi...@floodgap.com
on 30 Aug 2011 at 3:32
[deleted comment]
[deleted comment]
[deleted comment]
Can you spin that off to a separate issue? I'll look at it when I get a chance,
but let's keep this as a meta issue only.
Original comment by classi...@floodgap.com
on 12 Sep 2011 at 6:54
#1 is now issue 87.
Original comment by Tobias.N...@gmail.com
on 12 Sep 2011 at 9:35
Original comment by classi...@floodgap.com
on 14 Sep 2011 at 6:51
I think I'll give LossyConvertEncoding a try. It's converted easily and we
simply might do everythin unvectorized when the destination buffer happens to
be unaligned.
Original comment by Tobias.N...@gmail.com
on 13 Oct 2011 at 10:20
Feel free -- my main project right now is methodjit, so it's all yours if you
have an algorithm in mind. Spin off another issue and block this one on it.
Original comment by classi...@floodgap.com
on 14 Oct 2011 at 4:17
Original comment by Tobias.N...@gmail.com
on 16 Oct 2011 at 12:37
Original comment by Tobias.N...@gmail.com
on 16 Oct 2011 at 12:43
Original comment by Tobias.N...@gmail.com
on 16 Oct 2011 at 1:14
LossyConvertEncoding is now in issue 98
Original comment by Tobias.N...@gmail.com
on 16 Oct 2011 at 1:15
I've started to port all that Altivec acceleration routines to linux. Obviously
accelereated JPEG decoding cannot be ported.
To do runtime Altivec detection based on CPU capabilities I use an Altivec
detection function borrowed from freevec.org .
If we'd do the same for Mac OS X there would no longer be a need to
differentiate between G3 and G4-7400/7410 builds since apart from Altivec those
two are treated identically by gcc. Would that be worth it?
Original comment by Tobias.N...@gmail.com
on 5 Nov 2011 at 10:00
Doesn't 7400 have different scheduling?
Original comment by classi...@floodgap.com
on 5 Nov 2011 at 11:27
There's one sole file in gcc describing scheduling for 750 and 7400:
"config/rs6000/7xx.md"
Apart from Altivec only the dmul opcode (should be multiplication of two
doubles) has different scheduling.
That difference shouldn't be noticable.
Original comment by Tobias.N...@gmail.com
on 6 Nov 2011 at 12:11
Okay. I'm going to do a little research on it.
I'm not opposed to it in principle, and certainly having one less build to test
and generate is a plus, but I also want to make sure it won't trip up our less
techy users. The 7400 vs 7450 issue is already confusing to beginners, though I
think most people have figured it out by now.
For the moment let's continue with compile-time determination and then revisit
this.
Original comment by classi...@floodgap.com
on 6 Nov 2011 at 12:15
The linux port is now working well - linux runtime Altivec detection also
working fine.
I didn't yet port our changes to libvpx but I'm quite sure it will need to be
PIC compatible for real so I'll delay that. And debian doesn't use the in-tree
version of libvpx anyway.
I also added the Apple recommended sysctl-based Altivec runtime detection for
Mac OS X to the code but didn't test it yet. The way it's implemented at the
moment Altivec support is always compiled for PowerPC builds but activated only
based on the result of the runtime Altivec support detection. The detection
itself is only run once. That's the same behaviour as for x86 CPUs.
Now should I file the changeset to mozilla and/or the debian package maintainer?
Original comment by Tobias.N...@gmail.com
on 6 Nov 2011 at 2:59
I'd file it with Mozilla; Debian will pick up the changes downstream. I'm not
sure who the reviewer would be, though, since our stuff touches a large number
of modules. I'd go one at a time for each one, possibly see who reviewed the
SSE2 version and ask if they would be willing.
Original comment by classi...@floodgap.com
on 6 Nov 2011 at 3:03
(If you want, I can "pre-review" them to make sure all the nitpicky stuff is
there, like license boilerplate and correct author attributions.)
Original comment by classi...@floodgap.com
on 6 Nov 2011 at 3:08
Yes, pre-reviewing would be nice!
And once the changes are accepted in mozilla-central it needs three releases
until they reach the end users?
As you see in the attached patch file I modified the debian iceweasel-7.0.1
packages and don't even know if the patch would apply to mozilla-central
without problems.
Original comment by Tobias.N...@gmail.com
on 6 Nov 2011 at 6:29
Attachments:
First, on the auto-select for us: it dawned on me there's a good reason to keep
it as it is, and that is because the G3 build is guaranteed to have no AltiVec
code in it even on an AltiVec-enabled system. This gives us a reference build
that will always run AltiVec free. So temporarily I'm going to short circuit
the code. However, I think you *should* submit them to Mozilla with that in
them because they like single builds and their SSE2 and NEON code has runtime
detection. The only thing I'd make sure of is that the build doesn't crap out
on a G3 if at all possible (i.e., a G3 should still be able to build it).
On the new files in your patch: I haven't observed house style in the ones I
originated because I hadn't intended to send them to Mozilla at that time. On
those, make sure of the following:
- I use CTRL-I tabs, but they generally like 4-space tabs
- Make sure the MPL boilerplate and vim sequence is at the beginning
- Proper attribution. The original code is not Mozilla Foundation code, it's
either you, me or both, depending on which one of us worked on it.
- I put in funny comments but Mozilla doesn't share my sense of humour
typically, so you might consider making these more "bland" :P
Original comment by classi...@floodgap.com
on 7 Nov 2011 at 3:58
Ooops, hit save too soon. When we pick up the runtime code, I'm going to
replace it with #ifdefs in our local changesets, but we don't need to have
TENFOURFOX_* specific stuff in the Moz tree, of course.
I don't know Debian's schedule. If they only follow mozilla-release, yes, it
will take three cycles to get from mozilla-inbound to mozilla-release. We track
mozilla-beta, so it typically takes two.
Original comment by classi...@floodgap.com
on 7 Nov 2011 at 3:59
By modifying xpcom/glue/VMX.h it should be very easy to short cut the
detection. And as qcms has it's own pure C detection (for x86 it has as well)
it'd have to be done there as well (gfx/qmcs/transform.c)
I did never test the Mac OS X detection code. Did you do so?
Unfortunately my external hard drive enclosure decided to die minutes after I
uploaded the changeset. So now I have to get a new one.
And as the PB Wallstreet G4 upgrade board started to show some soldering
problems it now has to run either with L2 cache switched off (very slow) or
with the original G3 upgrade card (very slow as well), and therefore I cannot
really use it as a build platform any more...
Original comment by Tobias.N...@gmail.com
on 7 Nov 2011 at 8:30
Wow, that sucks! :(
I have not, but it looks correct. Should be straightforward enough to wire that
into a simple little .c test file. I'll do that a bit later, since it seems
like your system is out of commission for right now.
Original comment by classi...@floodgap.com
on 8 Nov 2011 at 2:48
Another target which appeared in Fx9. Not as big a win but seems to be
straightforward to implement.
Original comment by classi...@floodgap.com
on 20 Nov 2011 at 3:04
All five extant issues are landed in 9.
Original comment by classi...@floodgap.com
on 23 Nov 2011 at 9:56
IMO this is a more correct place for the problems with AltiVec text conversions
in TFF 9.
What I implemented in a different way than the SSE versions of UTF8Utils and
nsUTF8ToUnicode is that the pointers are only updated once upon exiting the
functions. Changing that back to update them every loop iteration might be
worth a try and might already be threadsafe - but I don't know much about
threadsafety.
Original comment by Tobias.N...@gmail.com
on 30 Nov 2011 at 5:38
http://developer.apple.com/hardwaredrivers/ve/alignment.html implies that
unaligned stores, being non-atomic, are not thread-safe. I guess we could mutex
it, but that would be technically disgusting and may not be faster. I think we
just have to hope that the OS gives us aligned storage in most cases unless you
can think of another way.
Original comment by classi...@floodgap.com
on 30 Nov 2011 at 5:47
Because of that warning about unaligned stores there aren't any in our code (or
am I wrong?) but instead we load from unaligned locations which is thread safe
AFAIK,
Original comment by Tobias.N...@gmail.com
on 30 Nov 2011 at 6:10
Hmmm... that Apple doc also says the following in the section about unaligned
stores:
"There is no way to do an atomic misaligned AltiVec load or store without
falling back on other atomic synchronization primitives like a mutex or the
operations in OSAtomic.h."
But in the section about unaligned loads this isn't mentioned and reading it
you get the impression that unaligned reads are completely safe.
Original comment by Tobias.N...@gmail.com
on 30 Nov 2011 at 6:24
Found this fun way of detecting AltiVec without sysctl. A little scary, but
clever. (Ignore the Gestalt code, it's just for comparison.)
+#if idppc_altivec && !MACOS_X
+/* This is the brute force way of detecting instruction sets...
+ the code is borrowed from SDL, which got the idea from the libmpeg2
+ library - thanks!
+ */
+#include <signal.h>
+#include <setjmp.h>
+static jmp_buf jmpbuf;
+static void illegal_instruction(int sig)
+{
+ longjmp(jmpbuf, 1);
+}
+#endif
+
static void Sys_DetectAltivec(void)
{
// Only detect if user hasn't forcibly disabled it.
+ qboolean altivec = qfalse;
if (com_altivec->integer) {
#if idppc_altivec
+ #ifdef MACOS_X
long feat = 0;
OSErr err = Gestalt(gestaltPowerPCProcessorFeatures, &feat);
if ((err=noErr) && ((1 << gestaltPowerPCHasVectorInstructions) & feat)) {
+ altivec = qtrue;
}
+ #else
+ void (*handler)(int sig);
+ handler = signal(SIGILL, illegal_instruction);
+ if ( setjmp(jmpbuf) = 0 ) {
+ asm volatile ("mtspr 256, %0\n\t"
+ "vand %%v0, %%v0, %%v0"
+ :
+ : "r" (-1));
+ altivec = qtrue;
+ }
+ signal(SIGILL, handler);
+ #endif
+
+ if (!altivec) {
+ Cvar_Set( "com_altivec", "0" ); // we don't have it! Disable support!
+ }
#endif
Original comment by classi...@floodgap.com
on 18 Jan 2012 at 12:32
*from http://web.archiveorange.com/archive/v/QCRpizEPgO8aYzq2LPcG
Original comment by classi...@floodgap.com
on 18 Jan 2012 at 12:33
I had found this code as well but I also found out that that way of detecting
Altivec can cause strange problems.
Original comment by Tobias.N...@gmail.com
on 18 Jan 2012 at 4:05
I agree it's a bit freaky, but I like the illegal instruction concept in
theory. Probably something about the longjmp or improperly handling SIGILL.
Original comment by classi...@floodgap.com
on 18 Jan 2012 at 4:16
That's what warns about using SIGILL based Altivec detection:
https://bugzilla.redhat.com/show_bug.cgi?id=435771
Nevertheless the SIGILL version might work perfectly in TFF.
Original comment by Tobias.N...@gmail.com
on 18 Jan 2012 at 4:46
It's time to make libpixman even more awesome.
Original comment by classi...@floodgap.com
on 5 Jun 2012 at 9:51
libvpx IDCT has to be extended (or the plain C version would have to be used).
Original comment by Tobias.N...@gmail.com
on 12 Jun 2012 at 10:22
Original comment by Tobias.N...@gmail.com
on 12 Jun 2012 at 10:23
We might want to think about gfxAlphaRecovery due to bug 741682.
Original comment by classi...@floodgap.com
on 23 Jul 2012 at 1:47
I don't know why that happened (I guess Google Code changing blocking flags),
sorry about the spam.
Original comment by classi...@floodgap.com
on 23 Jul 2012 at 3:20
Original comment by classi...@floodgap.com
on 6 Nov 2012 at 9:33
I began contributing our patches back to mozilla.
M817033 (https://bugzilla.mozilla.org/show_bug.cgi?id=817033) is the meta bug
tracking all the others.
Original comment by Tobias.N...@gmail.com
on 30 Nov 2012 at 9:13
files that contain SSE SIMD fast paths we didn't port yet:
=== code that IS (probably) in use ===
* gfx/2d/convolver.cpp (SSE2 code currently disabled; see comment)
* gfx/2d/BlurSSE2.cpp (that's our issue 189, seems to be important)
* gfx/2d/ImageScalingSSE2.cpp (seems to be important as well)
* gfx/thebes/gfxAlphaRecoverySSE2.cpp (probably also important)
=== code that may get used in the future ===
* media/libsoundtouch/src/sse_optimized.cpp (an upcoming sound processing library; might become important)
=== code that is not used for general browsing ===
* media/libvorbis/lib/os.h (used to convert a double to an int)
* media/libopus/celt/float_cast.h (float to int conversion once more)
* media/webrtc/trunk/src/modules/audio_processing/aec/aec_core_sse2.c (webrtc is not important for general browser speed)
* media/webrtc/trunk/src/modules/audio_processing/aec/aec_rdft*
* media/webrtc/trunk/src/modules/video_processing/main/source/content_analysis_sse2.cc
=== code that isn't in use actually ===
* gfx/angle/src/libGLESv2/TextureSSE2.cpp (even for WebGL I don't think ANGLE is used on OS X)
* gfx/skia/src/opts/SkBitmapProcState_opts_SSE* (skia is not normally used on OS X)
* gfx/skia/src/opts/SkBlitRect_opts_SSE*
* gfx/skia/src/opts/SkBlitRow_opts_SSE*
* gfx/skia/src/opts/SkUtils_opts_SSE*
Original comment by Tobias.N...@gmail.com
on 30 Nov 2012 at 11:38
Just noticed that gfx/thebes/gfxAlphaRecoverySSE2.cpp was already listed and
considered less important; might that also be used for video rendering using
GStreamer?
Original comment by Tobias.N...@gmail.com
on 30 Nov 2012 at 11:54
It's possible it's been plugged into other things; I haven't really checked
recently. ImageScalingSSE2.cpp could be helpful, but I thought we used a
different algorithm on OS X (I could be wrong about this).
Original comment by classi...@floodgap.com
on 1 Dec 2012 at 12:01
And now that I look at bug 741682, we *should* support AlphaRecovery since it's
going to be part of DLBI going forward. So maybe that should be our next goal.
Original comment by classi...@floodgap.com
on 1 Dec 2012 at 12:04
AlphaRecovery is not currently used. The source file is compiled but the XUL
shared library doesn't contain its symbols. It is indeed used in
gfxWindowsNativeDrawing (Windows) and gfxXlibNativeRenderer (X11) only.
ImageScaling similarly used by DrawTargetD2D (hence Windows) only.
convolver is compiled and linked into XUL, at least on 10.5, but I couldn't
find any call to it (it ends up being used indirectly by gfx/2d/Scale but that
itself isn't called anywhere).
Blur is indeed used a lot.
Original comment by Tobias.N...@gmail.com
on 1 Dec 2012 at 10:24
Original issue reported on code.google.com by
classi...@floodgap.com
on 21 Jun 2011 at 10:24