shotgunsoftware / tk-framework-adobe

Foundation Tech for all Adobe Integrations
https://developer.shotgridsoftware.com/tk-framework-adobe/
Other
10 stars 8 forks source link

Photoshop: AdobeBridge websocket communication interrupted during long, blocking operations in Photoshop #36

Open akennedy-netflix opened 1 year ago

akennedy-netflix commented 1 year ago

Summary

Invoking the Photoshop API with engine.adobe can cause the AdobeBridge websocket communication to be interrupted. This only seems to happen during long, blocking operations that cause a progress bar in Photoshop to appear such as saving and loading large files from network storage, or saving files with high compression settings.

Note that the original operation (File Save, File Open, etc.) does complete successfully, but because it's immediately followed up by an error that crashes the plugin, you can't do anything else afterwards.

Impact

This bug causes the SGTK plugin to fail during vital operations such as saving and loading files. It can often become a blocking issue that prevents artists from publishing work.

Traceback

Note: line numbers will be slightly off: I've inserted a lot of logging messages to my local files for debugging.

[49446 ERROR sgtk.env.project.tk-photoshopcc] Traceback (most recent call last):
  File ".../tk_multi_pythonconsole/1.3.0._build1/5322/a/ext/python/app/input_widget.py", line 246, in execute
    exec(python_code, self._locals, self._locals)
  File "python input", line 13, in <module>
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/python/tk_framework_adobe/rpc/proxy.py", line 232, in __call__
    parent=self._parent,
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/python/tk_framework_adobe/rpc/communicator.py", line 300, in rpc_call
    wrapper_class=ProxyWrapper,
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/python/tk_framework_adobe/rpc/communicator.py", line 760, in __run_rpc_command
    results = self._wait_for_response(payload["id"])
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/python/tk_framework_adobe/adobe_bridge.py", line 44, in wrapper
    result = func(*args, **kwargs)
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/python/tk_framework_adobe/adobe_bridge.py", line 440, in _wait_for_response
    return super(AdobeBridge, self)._wait_for_response(uid)
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/python/tk_framework_adobe/rpc/communicator.py", line 655, in _wait_for_response
    self.wait(single_loop=True, process_events=False)
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/python/tk_framework_adobe/rpc/communicator.py", line 523, in wait
    process_events=process_events,
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/python/tk_framework_adobe/rpc/communicator.py", line 245, in process_new_messages
    self._io._process_packets()
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/pkgs/socketIO_client_nexus/__init__.py", line 276, in _process_packets
    for engineIO_packet in self._transport.recv_packet():
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/pkgs/socketIO_client_nexus/transports.py", line 158, in recv_packet
    packet_text)
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/pkgs/socketIO_client_nexus/parsers.py", line 96, in parse_packet_text
    packet_type = int(get_character(packet_text, 0))
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/pkgs/socketIO_client_nexus/symmetries.py", line 33, in get_character
    return chr(get_byte(x, index))
  File ".../tk_framework_adobe/1.1.7/411f/a/ext/pkgs/socketIO_client_nexus/symmetries.py", line 29, in get_byte
    return indexbytes(x, index)

IndexError: index out of range

Video

Here is a video of me saving a PNG file with the maximum compression settings.

https://user-images.githubusercontent.com/128504437/226666611-26ca66f9-fec1-4292-90ae-851a0f1fc1c1.mp4

Steps to reproduce

Reproducing the error can be tricky. I can reliably reproduce the error on MacOS, but not on Windows. Furthermore, not all large files produce the error. However, once I have a file that produces the error, it is consistently reproducible with the same steps.

  1. Open a large PSD file that will take a long time to save. I have uploaded the file from the video above to Google Drive: test.psd
  2. Open the ShotGrid Python Console
  3. Use the Python API to save the file as a PNG with maximum compression settings. Example code:
from pathlib import Path

dst = Path.home() / "Desktop" / "save_test.png"  # Save file to Desktop

png_file = engine.adobe.File(dst.as_posix())
png_options = engine.adobe.PNGSaveOptions()
png_options.interlaced = False
png_options.compression = 9  # Highest compression value, most likely to cause Photoshop to hang

document = engine.adobe.get_active_document()
document.saveAs(png_file, png_options, True)

If you get an IndexError in the Python console, followed by the ShotGrid Adobe panel shutdown, you have reproduced the error successfully.

Environment

Toolkit environment:

Python version:

Photoshop version:

System information: I have reproduced the error on both of the following systems:

What I've tried

I have read the ShotGrid community forum post for Adobe engine crashing on long operations. While this issue feels related, and indeed presents some of the same symptoms, I could not get any of the solutions to work.

My understanding is that disabling the heartbeat should be the most "forceful" way of circumventing some of these issues. I have tried saving with the heartbeat_disabled context manager, but the issue persists:

with engine.heartbeat_disabled():
    document.saveAs(png_file, png_options, True)

My investigation

(This is going to be really long, I apologize. I needed to get this all written down while it's still fresh in my head.)

The IndexError comes from the bundled pkgs.zip.

I've tried to break down the steps in the websocket communication to understand where the error is coming from, and this is my best understanding of what happens when you call, for example, document.saveAs:

  1. The Document.saveAs function is invoked on the document's FunctionWrapper
  2. An rpc call is sent over the websocket in Communicator.__run_rpc_command line 677
  3. We start waiting for a response from the rpc call in Communicator._wait_for_response line 611
  4. Packet data is received in Websocket.recv lines 292-304
    def recv(self):
        """
        Receive string data(byte array) from the server.
        return value: string(byte array) value.
        """
        opcode, data = self.recv_data()  # <-------------------------------- EMPHASIS MINE
        if six.PY3 and opcode == ABNF.OPCODE_TEXT:
            return data.decode("utf-8")
        elif opcode == ABNF.OPCODE_TEXT or opcode == ABNF.OPCODE_BINARY:
            return data
        else:
            return ''

This part is important. From the packet, we receive an opcode, and some data bundle as a byte string. In the example above that produces the IndexError, we get the following values:

opcode == 8
data == b''

Ok, let's continue...

  1. Next, that packet data is read in the socketIO-client-nexus WebsocketTransport.recv_packet lines 157-158
  2. The text from the packet data is parsed in the parse_package_text function lines 95-98
def parse_packet_text(packet_text):
    packet_type = int(get_character(packet_text, 0))  # <------------ EMPHASIS MINE
    packet_data = packet_text[1:]
    return packet_type, packet_data

This line emphasized here is why we get the IndexError. We are trying reading the first character from an empty byte string.

Alright, great. So that's one mystery solved. But why do we get an empty byte string, and what is the 8 opcode telling us? If we jump back to the websocket-client module, we can see the opcodes defined in _abnf.py lines 80-86:

    # operation code values.
    OPCODE_CONT   = 0x0
    OPCODE_TEXT   = 0x1
    OPCODE_BINARY = 0x2
    OPCODE_CLOSE  = 0x8
    OPCODE_PING   = 0x9
    OPCODE_PONG   = 0xa

The 8 opcode is a close event. Indeed if you keep tracking it down, you'll see that WebSocket.send_close (lines 362-373) is called immediately before the packet is received, which happens right before the IndexError.

If you add an except IndexError: ... block in tk-framework-adobe's Communicator.process_new_messages (lines 238-251), you can ignore the IndexError, but you immediately get another communication error, because the close signal has already been sent, and the socket communication is terminated.

And that's as far as I've been able to get. I do not know why the close signal is being sent. I don't know whether it's being deliberately called somewhere, or if it's a byproduct of some other error wrapped in a try/except clause. I dug around in the Adobe CEP plugin and couldn't find a smoking gun in there. I don't have a ton of familiarity with JavaScript, so I don't feel confident saying anything definitive one way or the other with regards to the plugin.

To circumvent the error for now, I have dialed down the compression settings that our publisher plugin uses to render PNGs. That makes the document.saveAs function complete more quickly, which doesn't produce the error. That is just a stopgap solution though. Sooner or later, I'm worried that an even more "troublesome" file will come along that we can't buy ourselves any more leeway on with the compression settings, and we'll be stuck unable to save it at all.

Any help investigating this, or suggestions on using the Python API to avoid the error, would be extremely welcome!