reactphp / filesystem

Evented filesystem access.
MIT License
135 stars 40 forks source link

Async polling slower then sync polling #74

Closed hvanoch closed 4 years ago

hvanoch commented 4 years ago

I am experimenting with reading the content of a bunch of files async. While doing some tests I notice that reading the files sync is a lot faster (+10 times) then doing this async with this library (using ext-eio)

Experiment:

I am monitoring (polling) 32 different sysfs files used for digital inputs (kinda like the raspberry pi). If I read them with file_get_content it takes about 2-4 milliseconds in total to read them, one by one, sync. If I use $filesystem->file($file)->getContents() and use \React\Promise\all() on all the promises, it takes something between 35-120 milliseconds. I would expect that reading them async would be faster, when reading multiple files.

Could somebody explain me why this is so slow?

ghost commented 4 years ago

Async is slower because there's a lot more switching between PHP and C, but also a file descriptior (from eio) needs to be watched by the event loop and wait for activity on it.

Using libuv might improve this, since it's completely C-land outside of the callback and the command. But I don't expect better performance than sync. You could test the libuv adapter from #69 (the adapter works, that's not why the PR was closed).

clue commented 4 years ago

@hvanoch These are some interesting benchmarking results! I think we all know async code execution doesn't magically make the code execute any faster, but your results are clearly now what one would expect from this (I know I didn't). Here's my attempt at explaining why theses results are reasonable:

The filesystem is inherently blocking, so no matter which event or filesystem extension you're using, it means it has to execute some additional overhead to wrap this blocking access in a non-blocking thread or process. In there, it will still execute the blocking filesystem primitives, which is why it's unlikely to get any faster.

In your specific case, you're reading from a virtual filesystem instead of regular files. In this case, the underlying filesystem functions happen to be surprisingly fast because they're actually executed in kernel space and usually do not have to access any underlying hardware devices. This means that any additional overhead will slow down this process disproportionately.

On the other hand, when you're using this component to access regular files, you may see some significant performance improvements. In this case, the filesystem often takes a few milliseconds to return data (depending on I/O utilization, device speeds etc.). This means that any overhead to move this into a non-blocking thread is well worth it because it allows you to do something else in the meantime (processing other filesystem streams or any other non-blocking I/O operation).

This means that in your specific case it may or may not be okay to block the loop for a very short period. You may also use some timers to periodically only check certain inputs to more evenly distribute this load to not cause any "spikes" in your program execution. Your numbers suggest one operations takes around 100µs, so there's plenty of time available for your 32 inputs. Depending on your workload, you may also be better off using an event-driven mechanism to only read your inputs when some change has been detected. ReactPHP happens to be really good at reacting to changes :-)

I believe this has been answered, so I'm closing this for now. Please come back with more details if this problem persists and we can always reopen this :+1:

hvanoch commented 4 years ago

I want to thank you guys for the reply. Your explanation makes a lot of sense. Indeed I guess one would benefit from using async when the access to the filesystem is actual slow (instead of me using a virtual filesystem) or with larger files instead of just having to read one character (0 or 1) from a file. I looked if there is a way in PHP to only get notified if a file has changed. Like Inotify, but that doesn't work with sysfs. Did not find an alternative for sysfs, That is why I thought polling async every 20 ms would be the best alternative.

I guess I will stick with reading the files sync. Like you said reading a single file only takes 0.1ms. If I add a timer for each file, maybe even add some jitter, I will not block the event loop that much.