keeleysam / tenfourfox

Automatically exported from code.google.com/p/tenfourfox
0 stars 0 forks source link

Scale WebM multithreading to number of CPUs #133

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I just looked and, well, they're right, we're not spawning extra decoding 
threads.

https://bugzilla.mozilla.org/show_bug.cgi?id=683825

We should add this support because it would help on MDDs and dual/quad G5s.

Original issue reported on code.google.com by classi...@floodgap.com on 18 Feb 2012 at 7:10

GoogleCodeExporter commented 9 years ago
The problem is we don't want to do this on G4 because only the MDD is dual 
*and* fast enough. So let's limit this to G5 to start with. With 

diff --git a/media/libvpx/vp8/decoder/threading.c 
b/media/libvpx/vp8/decoder/threading.c
--- a/media/libvpx/vp8/decoder/threading.c
+++ b/media/libvpx/vp8/decoder/threading.c
@@ -525,22 +525,27 @@ static THREAD_FUNCTION thread_decoding_p
 void vp8_decoder_create_threads(VP8D_COMP *pbi)
 {
     int core_count = 0;
     int ithread;

     pbi->b_multithreaded_rd = 0;
     pbi->allocated_decoding_thread_count = 0;

+#ifdef TENFOURFOX_G5
+#warning enabling multithreading
+    core_count = 2;
+#else
     /* limit decoding threads to the max number of token partitions */
     core_count = (pbi->max_threads > 8) ? 8 : pbi->max_threads;

     /* limit decoding threads to the available cores */
     if (core_count > pbi->common.processor_core_count)
         core_count = pbi->common.processor_core_count;
+#endif

     if (core_count > 1)
     {
         pbi->b_multithreaded_rd = 1;
         pbi->decoding_thread_count = core_count - 1;

         CHECK_MEM_ERROR(pbi->h_decoding_thread, vpx_malloc(sizeof(pthread_t) * pbi->decoding_thread_count));
         CHECK_MEM_ERROR(pbi->h_event_start_decoding, vpx_malloc(sizeof(sem_t) * pbi->decoding_thread_count));

decoding performance improves by almost 80% on the quad.

Original comment by classi...@floodgap.com on 20 Feb 2012 at 3:43

GoogleCodeExporter commented 9 years ago

Original comment by classi...@floodgap.com on 20 Feb 2012 at 3:43

GoogleCodeExporter commented 9 years ago
Just for yuks, I tried this on a 7450 build on the 1GHz G4 and, as predicted, 
it choked. Even single threaded, WebM decoding and playback pegs the CPU, so 
making it multithreaded just makes it worse. So G5 only.

Original comment by classi...@floodgap.com on 20 Feb 2012 at 3:46

GoogleCodeExporter commented 9 years ago
But aren't there dual G4's which could benefit?

Original comment by oldnewid...@gmail.com on 20 Feb 2012 at 6:17

GoogleCodeExporter commented 9 years ago
Well, the upper end MDDs are about the only ones; the dual CPUs under 1.25GHz 
are likely to blow just as bad. Nevertheless, this is a more scalable solution, 
enables four threads on the quad, and picks the right number of threads for 
other systems:

 void vp8_decoder_create_threads(VP8D_COMP *pbi)
 {
     int core_count = 0;
     int ithread;
+// for below
+size_t length;
+int error;
+int mib[2];

     pbi->b_multithreaded_rd = 0;
     pbi->allocated_decoding_thread_count = 0;

+#ifndef DEBUG
+#warning enabling multithreading
+    //error = sysctlbyname("hw.ncpu", &core_count, &length, NULL, 0);
+    mib[0] = CTL_HW;
+    mib[1] = HW_NCPU;
+    error = sysctl(mib, 2, &core_count, &length, NULL, 0);
+    if (error != 0) core_count = 1;
+#else
     /* limit decoding threads to the max number of token partitions */
     core_count = (pbi->max_threads > 8) ? 8 : pbi->max_threads;

     /* limit decoding threads to the available cores */
     if (core_count > pbi->common.processor_core_count)
         core_count = pbi->common.processor_core_count;
+#endif

So we'll ship that. Testing in Activity Monitor, four decoding threads start on 
the quad when video plays.

Original comment by classi...@floodgap.com on 20 Feb 2012 at 6:35

GoogleCodeExporter commented 9 years ago
Shipp'd

Original comment by classi...@floodgap.com on 8 Mar 2012 at 3:35