keeleysam / tenfourfox

Automatically exported from code.google.com/p/tenfourfox
0 stars 0 forks source link

[meta] identify, look up and triage targets for SIMD AltiVec conversion #73

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Spinoff of issue 51 and issue 64. This will be a master bug from which others 
will be descended for converting already extant SIMD code into AltiVec 
(preferably SSE2 since it is also 128-bit).

1. qcms SSE2: 512865
Don't know why Mozilla has so many colour space converters; here's another one. 
Guts in gfx/qcms/transform-sse2.c (sse1 is for straight SSE). This is used a 
lot, so it would be a nice win, but is moderately complex. Fortunately it is 
already written as intrinsics.

2. gfx/thebes/gfxAlphaRecoverySSE2.cpp: 587936
Mostly for plugins. Probably not worth it to us.

3. content/base/src/nsTextFragmentSSE2.cpp; is-ascii in nsTextFragment could be 
AltiVec'd: 585978
        essentially looks to see if each short is > 255. need to
        verify there isn't an endian problem here. vec_any_ge is the
        intrinsic we want, as it returns 1 for anything out of range.
        this is SSE2 so the same algorithm will work (same bit width).
        PRUnichar is currently defined as an unsigned short (not wchar_t).

4. intl/uconv/src/nsUTF8ToUnicodeSSE2.cpp: 506430
This pretty much speeds up anything that does this conversion, which is pretty 
much everything. It already prealigns to 16 bytes, which should favour AltiVec 
heavily.

5. LossyConvertEncoding could be AltiVec'd: 586698
        this could be problematic with unaligned stores; the SSE2
        version explicitly stores unaligned. A first pass from Apple:
// Mostly safe to use with aligned and unaligned addresses
void StoreUnaligned( vector unsigned char src, unsigned char *target )
{
    vector unsigned char MSQ, LSQ, edges;
    vector unsigned char edgeAlign, align;

    MSQ = vec_ld(0, target); // most significant quadword
    LSQ = vec_ld(15, target); // least significant quadword
    edgeAlign = vec_lvsl(0, target); // permute map to extract edges
    edges=vec_perm(LSQ,MSQ,edgeAlign); // extract the edges
    align = vec_lvsr( 0, target ); // permute map to misalign data
    MSQ = vec_perm(edges,src,align); // misalign the data (MSQ)
    LSQ = vec_perm(src,edges,align); // misalign the data (LSQ)
    vec_st( LSQ, 15, target ); // Store the LSQ part first
    vec_st( MSQ, 0, target ); // Store the MSQ part
}
However, since they are contiguously misaligned, this is not helpful. We
need to get the first misaligned store fixed, then align thereafter.
Might be too big for 6, but fortunately if it breaks anything it would
pretty much break everything. Also not thread-safe, but unlikely to be a
big issue for this code.

3 and 4 seem good to start with.

Original issue reported on code.google.com by classi...@floodgap.com on 21 Jun 2011 at 10:24

GoogleCodeExporter commented 9 years ago
I'm going to put issue 195 here even though I'm not sure if going vector buys 
anything because the scanner routine I wrote is almost optimal for 
vectorization though I suspect we're getting unaligned data and it's not really 
scanning that much. Plus I might want to optimize it more in the future. (Issue 
195 also applies to AuroraFox since it uses ATSFontRefs -- I'll have it in the 
19 beta changesets for you to examine. While it is worsened by TenFourFox's 
need for a solution to issue 171, it will improve font table enumeration speed 
considerably overall.)

Original comment by classi...@floodgap.com on 17 Dec 2012 at 4:49

GoogleCodeExporter commented 9 years ago

Original comment by classi...@floodgap.com on 21 Feb 2013 at 6:27

GoogleCodeExporter commented 9 years ago
Removing issue 195 because it no longer makes sense to implement it since we're 
getting offsets directly out of the font table directory.

Original comment by classi...@floodgap.com on 29 Oct 2013 at 4:48

GoogleCodeExporter commented 9 years ago
Since WebRTC is now part of our builds and sort of works, spun off into issue 
250 and libsoundtouch into issue 251. Interestingly, libopus doesn't seem to 
have a SIMD version.

Original comment by classi...@floodgap.com on 29 Oct 2013 at 5:01

GoogleCodeExporter commented 9 years ago

Original comment by classi...@floodgap.com on 20 Apr 2014 at 4:49