Closed lexidor closed 3 years ago
How are we going to ensure that calling multiple async methods at once doesn't add more timing dependent races than were inherent in our $this->handle?
I'm not convinced there's more but having multiple 'concurrent' callers interact with the same FD around other operations is unsafe, even if async IO ops aren't being used. If an FD needs to be shared between callers in the same process, some kind of application-level locking or queuing needs to be built. This needs to be application-level as there are often ordering constraints between multiple IO operations, e.g. when dealing with network protocols that do not support multiplexing, or pretty much involving seek
Failing unit test:
diff --git a/tests/io/BufferedReaderTest.php b/tests/io/BufferedReaderTest.php
index d08acc8..d1e0607 100644
--- a/tests/io/BufferedReaderTest.php
+++ b/tests/io/BufferedReaderTest.php
@@ -8,7 +8,8 @@
*
*/
-use namespace HH\Lib\{IO, OS, Vec};
+use namespace HH\Lib\{IO, OS, Str, Vec};
+use namespace HH\Lib\_Private\_IO;
use function Facebook\FBExpect\expect; // @oss-enable
use type Facebook\HackTest\HackTest; // @oss-enable
@@ -107,6 +108,28 @@ final class BufferedReaderTest extends HackTest {
expect(await $r->readUntilAsync("FOO"))->toEqual("cd");
}
+ public async function testReadUntilBufferBoundary(): Awaitable<void> {
+ // Intent is to test the case when the separator starts in one chunk, and
+ // ends in another, i.e.:
+ // - Str\length($padding) < chunk size
+ // - Str\length($padding.$separator) > chunk size
+ $padding = Str\repeat('a', _IO\DEFAULT_READ_BUFFER_SIZE - 1);
+ $separator = 'bc';
+
+ list($r, $w) = IO\pipe();
+ concurrent {
+ await async {
+ await $w->writeAllAsync($padding.$separator.'junk');
+ $w->close();
+ };
+ await async {
+ $br = new IO\BufferedReader($r);
+ expect(await $br->readUntilAsync($separator))->toEqual($padding);
+ $r->close();
+ };
+ }
+ }
+
public async function testReadLineVsReadUntil(): Awaitable<void> {
$r = new IO\BufferedReader(new IO\MemoryHandle("ab\ncd"));
expect(await $r->readLineAsync())->toEqual('ab');
If I replace the - 1
with -2
, the test passes
Describe the bug
IO\BufferedReader::readUntilAsync()
scans for your suffix in chunks. If your suffix is larger than a byte and happens to align with the_IO\DEFAULT_READ_BUFFER_SIZE
, the internalStr\contains()
call returns false. This causes incorrect results.Standalone code, or other way to reproduce the problem
This example uses a temp file instead of a MemoryHandle, because MemoryHandle uses
Math\INT64_MAX
as chunk size instead of the_IO\DEFAULT_READ_BUFFER_SIZE
. A demonstration with a 8EiB string is unfeasible :smile:.Expected behavior
Calling
->readUntilAsync()
should return"ab<nul>...<nul>"
in the first using block and"ab"
in the second.Actual behavior
Environment hsl-experimental version: v4.66.0 hhvm version: 4.85.0
Additional context
In order to solve this problem, we could accumulate into a buffer and use
Str\search()
's third argument (offset) to prevent doing duplicate work.This did get me thinking about multiple concurrent calls to the methods of
IO\BufferedReader
. They behave unpredictably since someone might start stealing your bytes while you are awaiting the filesystem. So you would get kbytes 0..8 and another caller would steal kbytes 9..16 from in between. You'd then append 17..24 to 0..8 and end up with missing bytes in your local buffer. How are we going to ensure that calling multiple async methods at once doesn't add more timing dependent races than were inherent in our$this->handle
?