pyblish / pyblish-qml

Pyblish QML frontend for Maya 2013+, Houdini 11+, Nuke 8+ and more
GNU Lesser General Public License v3.0
115 stars 44 forks source link

Fix occasional ipc deadlock #325

Closed nasefbasdf closed 5 years ago

nasefbasdf commented 5 years ago

Depending on when threads print to stdout, there is a chance that the IPC gets stuck forever. The easiest explanation/demo is the following:

import sys
import threading
import time

def ipc_communication_thread():
    while True:
        msg = '{"json": "yes"}'
        sys.stdout.write(msg + "\n")
        sys.stdout.flush()

def anything_else():
    while True:
        print "logging something"

t0 = threading.Thread(target=ipc_communication_thread)
t0.daemon = True
t1 = threading.Thread(target=anything_else)
t1.daemon = True

t0.start()
t1.start()

time.sleep(2)

Looking at the output the problem is readily apparent:

... {"json": "yes"} logging something{"json": "yes"}

logging something{"json": "yes"}

logging something{"json": "yes"}

{"json": "yes"} logging something {"json": "yes"} ...

The print does not put its content into the stdout stream in an atomic fashion. Sometimes the newline lags behind just enough to allow the other thread to insert its own message into the stream. With the current setup of the IPC message parsing this means that message is not picked up and is lost forever.

This would also happen if some module logged/printed some bytes without newline endings.

Luckily the fix is easy as pie. By enclosing the message body with newline characters it will surely be picked up by the IPC parser.

I'm curious. Why are we using sdtin/out streams for communication instead of sockets for example?

mottosso commented 5 years ago

Thank you for this! This might help explain some of the issues people have reported when having really long log messages.

I made a minor change for performance reasons; when the data string gets long, concatenating it can take a while.

I'm curious. Why are we using sdtin/out streams for communication instead of sockets for example?

The first iteration was actually socket-based, using the excellent xmlrpc library shipped with the standard library. I had a few issues with that, primarily with regards to allocating unique port numbers. See, users of pyblish-qml often have multiple DCCs open at once, and sometimes multiple users logged into the same machine. It was challenging coming up with port numbers which wouldn't conflict, and the Windows OS was not very co-operative here, as it would allow for multiple processes to bind to the same port, mangling any messages sent through. So I couldn't rely on "ask forgiveness, not permission" methodology, like I could on Unix.

I was rather limited on resources too, as I wanted to avoid compiling anything for Maya; else I would have chosen a message passing library like ZeroMQ. I also could have compiled something to query port numbers and ensure uniqueness.

In retrospect, IPC through STDIN/STDOUT has worked very well, it's got this nice property of encapsulation between itself and a host. I also know when a host dies, at which point the pipe closes. With sockets, I found it challenging to know when e.g. Maya died. It was also really performant. To my surprise, it heavily outperformed xmlrpc over sockets, resulting in a noticeable difference to the end-user; especially when running many small plug-ins. I've clocked it in at about 6gb/sec on a moderate machine, with very low latency.

mottosso commented 5 years ago

I'll leave this hanging for ~24h, to let folks take it for a spin and confirm it works.

nasefbasdf commented 5 years ago

I'm happy to help, but I must say that your change defeats the purpose of the pull-request.

The call to sys.stdout.write is atomic thus the newlines cannot be separated in my version from the message. Your version is actually worse in comparison to the original. An arbitrary other thread can not only insert its stuff to the front but also to the back of the message. If you change my demo script to have the three calls to sys.stdout.write the output will be something like the following:

... logging something

{"json": "yes"}logging something

logging something {"json": "yes"}

logging something

logging something{"json": "yes"}

logging something ...

In other words, the sys.stdout is a threadsafe buffer that collects the strings. In a multithreaded application the active thread can change between calls to sys.stdout.write. If the newly active thread also prints something, their output will be mixed up.

My last commit fixes it again and I added a little explanation next to it.

mottosso commented 5 years ago

Oh, yes that makes sense. Good call, I've been doing it wrong elsewhere as well so this helps a lot! With regards to concatenating large strings like this, it's likely premature optimisation and won't have any real effects to pyblish-qml, but have you got any thoughts as to how to remain atomic without concatenating? Perhaps something along the lines of a byte array?

nasefbasdf commented 5 years ago

You won't be able to avoid making a "big string" at the end. If the contents of said string are made up gradually, you can collect them with a list or a buffer like StringIO for example. Using either of them reduces memory reallocation. The benefit of using these patterns become greater the more string concatenations you avoid.

str_collector = []
# many calls like
str_collector.append("string")
# after many strings collected
result = "".join(str_collector)
# NOTE: use cStringIO for performance over StringIO
from cStringIO import StringIO
sio = StringIO()
# many calls like
sio.write("string")
# after many strings collected
result = sio.getvalue()
sio.close()

In case of prepending and appending a single character only (two concatenations), the benefits may be negligible.

With buffering: Collect 1+N+1 bytes then make a new 2+N length string.

With string concatenation: make a new string of 1+N length then a second new string with 2+N length. The first one is temporary and is soon (immediately?) freed.

One other thing that could speed up things is to dump the json into the collecting buffer directly. This avoids making a temporary string for the json:

sio = StringIO()
sio.write("\n")
json.dump({'json': 'yes'}, sio)
sio.write("\n")
sys.stdout.write(sio.getvalue())
sio.close()
sys.stdout.flush()
mottosso commented 5 years ago

Thanks for spending time on this, I'm sure we're in the clear so far as pyblish-qml is concerned. Merging this.

But if you wouldn't mind, out of personal interest, I'd like to continue picking your brain. :)

In case of prepending and appending a single character only (two concatenations), the benefits may be negligible.

My understanding is that any concatenation creates a duplicate of the original, regardless of how large the added bits are. For print-sized strings, I'm not too worried, but some log messages from users include environments which themselves can rack up to a few megabytes in size.

To pick an extreme (unlikely) example to make a point:

multi_gb_string = "..."  # 5 gb memory use
multi_gb_string = multi_gb_string + "\n"  # 10 gb use (during)
# .. 5 gb use (after)
multi_gb_string = "\n" + multi_gb_string + "\n"  # 15 gb use (during)
# .. 5 gb use (after)

Am I getting that right?

Array or StringIO

That makes sense, as concatenation wouldn't then have to touch what's already there.

multi_gb_list = [...]
multi_gb_list.append("\n")
multi_gb_list.insert(0, "\n")

Although that last line might? Wouldn't it have to shuffle all characters, effectively duplicating memory much like string-concatenation? Not sure if StringIO works differently.

nasefbasdf commented 5 years ago

You are mostly right about concatenation. However CPythion does some optimizations on the last line. A=x+A+y is an expression where the original content of A can be discarded after one concatenation. So it actually needs 10gb instead of 15gb:

A=x+A+y  # A: 5gb
A=xA+y   # A: 5gb, xA:5gb; temporary variable xA holds the result of x+A
# the original A can be freed since it will be replaced after current expression and we do not need it any longer for these operations
A=xA+y   # A: freed, xA:5gb
A=xAy    # xAy: 5gb, xA:5gb
A        # A: 5gb, xA: freed; A becomes an alias for xAy and xA can be freed

It all boils down to what the dependencies among the variables are and how well the compiler/interpreter can optimize using those. It is not readily apparent what memory consumption an expression will require.

For example A=(x+A)+(A+y) would require 20gb and go like this:

A=(x+A)+(A+y) # A: 5gb
A=xA+(A+y)    # A: 5gb, xA: 5gb
A=xA+Ay       # A: 5gb, xA: 5gb, Ay: 5gb
A=xA+Ay       # A: freed, xA: 5gb, Ay: 5gb
A=xAAy        # A: freed, xA: 5gb, Ay: 5gb, xAAy: 10gb
A             # A: 10gb, others are freed

Lists are another matter. Even if you have a list with [...N items, '<5gb string>'], inserting or deleting from before the last item would only reallocate the list "items"/references to variables. So it would only duplicate the size required by the list to reference the items and not the items themselves. Consider the following:

s = 'x'*5000000000  # ~5gb
l = [s]
d = s
l.insert(0, 'q')
del s
del l[0]
assert d == l[-1]

During that script you only ever need 5gb of memory since s never changes. It is an immutable (type) variable, so as long you do not change it, you can put it anywhere without its size affecting what you're doing. You're only creating/moving/deleting references to the value.

In other words: modifying a list aside from the very last element will result in duplicating the list. Even if the duplication is shallow, it should be avoided if possible.

The StringIO is file like object you can write/read in-memory. You can read about it in the official python docs. Python 2: https://docs.python.org/2/library/stringio.html Python 3: https://docs.python.org/3/library/io.html#io.StringIO for strings (unicode), https://docs.python.org/3/library/io.html#io.BytesIO for bytes

mottosso commented 5 years ago

Yes, that makes sense. Thanks a bunch @nasefbasdf, didn't think I'd level up on memory management today but there it is!