ikorennoy / jasyncfio

Java asynchronous file I/O based on io_uring Linux interface
Apache License 2.0
72 stars 10 forks source link

Offset based writes (leaving a gap in front of the file, which is filled later on) #68

Closed JohannesLichtenberger closed 1 year ago

JohannesLichtenberger commented 1 year ago

Hi,

I think there's an issue with offset based random writes in a file:

With a FileChannel I'm getting the following hexdump (writing at offset 208 and leaving a gap, before writing a header/uberpage which is written two times (bytes 0...99 and 100..199):

With the FileChannel based implementation:

johannes@luna:/tmp/sirix/json-path1/resources/shredded/data$ hexdump sirix.data 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 00000d0 0036 0000 5382 414e 5050 0059 0000 0100

With your library:

johannes@luna:/tmp/sirix/json-path1/resources/shredded/data$ hexdump sirix.data 0000000 0036 0000 5382 414e 5050 0059 0000 0100 0000010 0000 0100 0000 2200 042b 0001 0111 010c 0000020 0100 0100 2003 0f00 0000 1f00 0080 1180

Kind regards Johannes

ikorennoy commented 1 year ago

Hi!

Could you provide an example of the code on which the problem is reproduced.

JohannesLichtenberger commented 1 year ago

Hm, only the SirixDB io_uring implementation. But hopefully it should be reproducable with an empty file, and an offset based write:

I'm using Chronicle Bytes (Bytes.elasticByteBuffer(64_000)), but I guess you can use a simple direct Byte buffer, too:

final var buffer = bufferedBytes.underlyingObject().rewind();
buffer.limit((int) bufferedBytes.readLimit());
dataFile.write(buffer, offset).join();
dataFile.dataSync().join();

Writing at offset 208 for instance leaves a gap in the file if we replace the AsyncFile dataFile with a FileChannel. The file size when writing 1000 bytes at offset 208 for instance also grows to 1208, but with AsyncFile it's 1000.

I hope this will reproduce my case. Otherwise I replaced in the class ResourceConfiguration the storage type with Storage type.IO_URING, ran the unit/integration tests of sirix-core and set breakpoints in https://github.com/sirixdb/sirix/blob/c1521b9bbeebcccf4dda8ae6fded54c91e18ddf3/bundles/sirix-core/src/main/java/org/sirix/io/iouring/IOUringWriter.java#L290 and then `https://github.com/sirixdb/sirix/blob/c1521b9bbeebcccf4dda8ae6fded54c91e18ddf3/bundles/sirix-core/src/main/java/org/sirix/io/iouring/IOUringWriter.java#L326 and then did the hexdump.

ikorennoy commented 1 year ago

I tested your case. Most likely the problem is that if the second argument in the write method is an int, it's not the offset in the file, but the length. If you pass long as the second argument, then writing 1000 bytes at offset 208 results in a file of 1208 size, as expected.

JohannesLichtenberger commented 1 year ago

Strange, the passed argument is a long:

  private Bytes<ByteBuffer> flushBuffer(final PageTrx pageTrx, final Bytes<ByteBuffer> bufferedBytes) throws IOException {
    final long fileSize = dataFile.size().join();
    long offset;

    if (fileSize == 0) {
      offset = IOStorage.FIRST_BEACON;
      offset += (PAGE_FRAGMENT_BYTE_ALIGN - (offset % PAGE_FRAGMENT_BYTE_ALIGN));
    } else {
      offset = fileSize;
    }

    final var buffer = bufferedBytes.underlyingObject().rewind();
    buffer.limit((int) bufferedBytes.readLimit());
    dataFile.write(buffer, offset).join();
    dataFile.dataSync().join();
    return pageTrx.newBufferedBytesInstance();
  }
JohannesLichtenberger commented 1 year ago

Filesize is zero, then offset currently is 208L, but somehow the hexdump is as shown and subsequently a HeaderPage/UberPage is written twice in the first 200 bytes of the file. I'll change this to 512bytes or a customizable page-size, such that on an SSD they are in two different physical pages.

JohannesLichtenberger commented 1 year ago

I did the following test:

      dataFileChannel = FileChannel.open(dataFilePath, StandardOpenOption.READ, StandardOpenOption.WRITE);
      final var byteBuffer = ByteBuffer.allocateDirect(4);
      byteBuffer.putInt(27);
      byteBuffer.flip();
      dataFileChannel.write(byteBuffer, 1000L);
      dataFileChannel.force(true);

vs.

    CompletableFuture<AsyncFile> asyncFileCompletableFuture = AsyncFile.open(dataFilePath,
                                                                             dataFileEventExecutor,
                                                                             OpenOption.READ_WRITE,
                                                                             OpenOption.APPEND,
                                                                             OpenOption.CREATE);
    dataFile = asyncFileCompletableFuture.join();
    final var byteBuffer = ByteBuffer.allocateDirect(4);
    byteBuffer.putInt(27);
    byteBuffer.flip();
    dataFile.write(byteBuffer, 1000L).join();
    dataFile.dataSync().join();

and it's 1004 bytes (FileChannel) vs 4 bytes (AsyncFile).

Screenshot from 2022-11-09 23-36-44 Screenshot from 2022-11-09 23-38-23

Your library version is 0.0.4.

JohannesLichtenberger commented 1 year ago

Well, really stupid... opening the file in append-mode.