rialto-php / rialto

Manage Node resources with PHP
MIT License
170 stars 80 forks source link

Improve the socket layer #12

Open nesk opened 6 years ago

nesk commented 6 years ago

The socket layer was written with little knowledge about socket management, it would highly benefit from a rewrite. First, those issues must be fixed:

Also, the following features could be added for extra security:

jonnywilliamson commented 6 years ago

Just some quick random thoughts at the moment as I haven't had time to delve into this deeply.

There are a lot of socket functions that all seem very similar on first glance but are subtlety different.

socket_​read
socket_​recv
socket_​recvfrom
socket_​recvmsg

Are some that spring to mind of the top of my head. Also I know that there is a lot of variables at work between detecting the end of the data sent/read by the socket and if the socket has actually closed (http://php.net/manual/en/function.socket-read.php#121426 and others). Does the clue/php-socket-raw library deal with this easily for you?

Reading through your thoughts on the data being sent to streams and being able to decode large amounts of it, I can see what you mean.

Am I right in saying thought, at the moment there's no protection in a large amount of data being sent from node to rialto, and then it trying to json_decode it....so this problem still exists right now too.

I was wondering if there was any sensible way in which you could pass in a small payload of text only, that could easily be json_decoded, but it served as only a manifest/index to describe further binary streams that might be received.

Let me explain more.

First data payload is just describing what is going to be received:

{
  "type": screenshot,
  "size": "12345678",
  "stream": {
    "id": "screenshot1",
  },
}

This is small and easily decoded/encoded with no worry of memory issues. Then the next data load is a binary dump that's actually put into a stream, hence can be as large as it wants to be. This wouldn't need to be "decoded" and hence bypass memory issues.

You would have to decide if it was acceptable to pass a raw stream to a user etc, but the underlying ability to deal with it would have been handled. To be honest I haven't fully thought this idea through but just throwing some ideas out there. Shoot down as necessary.

Finally, I did stumble on a few other libraries that might be of use:

https://github.com/pcrov/JsonReader (https://github.com/pcrov/JsonReader/wiki explaining between push and pull parsers)

Also Guzzle PSR-7 has a large section on streams which I have dabbled in before too. Maybe of use? https://github.com/guzzle/psr7#stream-implementation

nesk commented 6 years ago

I was actually thinking about dropping clue/php-socket-raw, I don't think it does that much for Rialto and it prevents from really understanding how to use sockets in PHP.

Your idea for the 2-steps payload (meta first, data next) is interesting. However, I don't think it could be used that way. Rialto is implementation-agnostic, this means the screenshot type makes sense only to the PuPHPeteer implementation, but not to Rialto.

Instead of thinking about screenshots, we could think about strings, since PuPHPeteer screenshots are base64 images and, therefore, strings. This means we would initiate a 2-steps payload for all strings. However, this 2-steps payload would be useful only for top-level strings, but we could perfectly have heavy strings in a JSON object, and then the 2-steps payload will not be used (because it would be difficult to identify).

Even if Rialto could recognize and use a screenshot type, the issue would be the same: the screenshot({encoding: 'base64'}) method should return a string, not a stream, the memory exhaustion would occur anyway by converting the stream to a string. I don't want to return a type different from the one specified in Puppeteer's documentation.

Actually, the memory exhaustion doesn't seem to be an issue for anyone today, and it is possible to bypass the issue by saving the screenshot to a file and read it later in PHP. I think we can focus on improving the socket layer before thinking about streams.

nesk commented 4 years ago

Wild idea: maybe sockets are just something really boring and hard to use and we can just replace them with an HTTP server for Node and an HTTP client for PHP. Communication will be way easier and reliable and I don't think we will lose performance.

yolo

cc @spekulatius

spekulatius commented 4 years ago

My experience with sockets much much smaller than with HTTP. So I guess I got a natural bias. It would open new doors for different forms of interfaces too.

I wonder if people are okay with an application spinning up a HTTP server within itself (even if restricted)? I could imagine some systems being locked down to open up ports too.

spekulatius commented 4 years ago

Note: A test has been commented out until this is resolved. See https://github.com/rialto-php/puphpeteer/pull/112