What is the expected behavior? What do you see instead?
When using a large number of simultaneous vp8 codecs where there are many
decoders and one encoder, high cpu usage is present. The configuration is the
one encoder with token_parts set to 3 and threads set to match the number of
CPU on the machine.
What version are you using? On what operating system?
latest master 067fc49996c4fb1f7f0a6dddaf4e74a8561350e0 on debian x86_64
Can you reproduce using the vpxdec or vpxenc tools? What command line are
you using?
no, the command line tool is not effected as its not embedded in a threaded
app. The result cannot be duplicated.
Please provide any additional information below.
The attached patch was just a quick way to do a test but it vastly improved
results.
The busy loops in the code calling thread_sleep(0) even with the asm wait
instructions were heavily straining the system..
The difference with the patch applied was largely significant allowing many
more concurrent transcoders to operate in the threaded app.
I suggest some kind of conditional signal or using the existing semaphores to
signal these places where thread_sleep() are currently implemented to poll an
int value. I have not studied the code enough to feel comfortable making the
patch myself or I would have.
Using the patch In a 36 user video conference, the difference is staggering and
everything functions well. Without the patch the 12 core (24 core
hyperthreaded) machine is maxed out on all cpu and with the patch there was an
even distribution of 20% and resources to spare.
I noticed the change before 0e78efad0be73d293880d1b71053c0d70a50a080 was to use
usleep(nms*1000); so it was effectively calling usleep(0); so had the old
revision changed to thread_sleep(1); instead of changing to sched_yield() it
would have probably been a better solution.
However using sleeping for this is clearly not the most optimal solution. The
best thing would be to block on a conditional and have the threads send a
cond_signal() to wake it up every time it hits the proper condition.
Either way the way the code is now creates a 200% penalty in cpu usage and its
more likely the decoder busy loop than the encoder one since I am using many
more decoders than encoders but the change globally was what delivered working
results and I noticed 2 busy loops in the encode and 1 in the decode so they
all probably play a role.
Original issue reported on code.google.com by anthony....@gmail.com on 19 Mar 2015 at 11:44
Original issue reported on code.google.com by
anthony....@gmail.com
on 19 Mar 2015 at 11:44Attachments: