how to cowork with DIRECT_IO?

pahome commented 6 years ago

I want to use direct_io with libaio how to do like that?

when I added line below in the test/basic.py you provided, it failed. temp = tempfile.TemporaryFile()

flags = fcntl.fcntl(temp.fileno(), fcntl.F_GETFL)
fcntl.fcntl(temp.fileno(), fcntl.F_SETFL, flags | os.O_DIRECT)

Can't I combine direct_io with libaio?

vpelletier commented 6 years ago

From a very quick search on O_DIRECT, this may be due to memory alignment constraints. Which are documented in 2 open (see O_DIRECT in Notes).

To keep python-libaio zero-copy, and to not have to check whether the underlying fd is in direct mode, I guess I'll expose a function/class to produce a well-aligned buffer, to use instead of bytearray. I started working on one, but then ctypes gets in the way on Python2: ctypes.c_char.from_buffer(memoryview(bytearray(10))) raises TypeError: expected a writeable buffer object despite memoryview(bytearray(10)).readonly being False as expected. Python3 works as expected.

Also, example script needs several more changes to work realistically with O_DIRECT:

tempfile must be created unbuffered (python-level buffer), otherwise you get an error on file close as python tries to flush buffered writes (happened to me on interpreter exit).
temp.write call must be made with block-aligned and block-sized buffer, and temp.read cannot be used as python read will not create such aligned buffer internally. I am not used to O_DIRECT (in general, and even more in python). How would you normally use this ?

vpelletier commented 6 years ago

I confirm libaio works fine when given 4k-aligned and 4k-sized buffers.

Given how these constraints are independent from AIO, their effect on how the API can be used (write and seek entire blocks,...), the existence of several ways to implement buffer alignment, and as I did not have to change anything from the existing API, I'm not sure I actually want to handle it inside python-libaio.

pahome commented 6 years ago

Here is my code modified from example script, and I download libaio from https://pypi.python.org/pypi/libaio

I use mmap to map a new 4k buffer and write data to the buffer.


#from __future__ import absolute_import, print_function
import tempfile
import fcntl
import os
import mmap
import libaio

block_size = 4096
def enable_directio(f):
    flags = fcntl.fcntl(f.fileno(), fcntl.F_GETFL)
    fcntl.fcntl(f, fcntl.F_SETFL, flags | os.O_DIRECT)

def disable_directio(f):
    flags = fcntl.fcntl(f.fileno(), fcntl.F_GETFL)
    fcntl.fcntl(f, fcntl.F_SETFL, flags & ~os.O_DIRECT)

def main():
    data = "ssss"
    mbuf = mmap.mmap(-1, block_size, mmap.MAP_SHARED)
    #fd = os.open( "./foo.txt", os.O_RDWR|os.O_CREAT)
    temp = open("./foo", "w+")
    offset = 0
    mbuf.seek(0)
    mbuf.write(data[0:0 + block_size])
    enable_directio(temp)

    with libaio.AIOContext(1) as io_context:
        write_block = libaio.AIOBlock(
            libaio.AIOBLOCK_MODE_WRITE,
            temp,
            [
                bytearray(mbuf),
            ],
            0,
        )
        print(write_block)
        io_context.submit([write_block])
        temp.seek(0)
        for event in io_context.getEvents():
            print(event)

        disable_directio(temp)
        temp.close()

if __name__ == '__main__':
    main()

and always shows

<libaio.AIOBlock object at 0x7f2ba58a8b90>
(<libaio.AIOBlock object at 0x7f2ba58a8b90>, -22, 0)

-22 means invalid arguments. I still don't know why maybe I don't understand the api fully

pahome commented 6 years ago

I solve this......when I modify bytearray(mbuf) to mbuf. what kind of the bufferlist filed limit to? string?

New problem: When I run for a long time, I got error:

[Errno 11] io_submit
__init__.py, line 224, in submit
for x in block_list
libaio.py, line 153, in _raise_on_negative
raise OSError(-result, func.__name__)

It's all about libaio, I don't know how to solve it.

vpelletier commented 6 years ago

I solve this......when I modify bytearray(mbuf) to mbuf.

Correct, bytearray manages its own buffer, so it was creating a new memory chunk, loosing the alignment benefits from mmap.

[Errno 11] io_submit

11 is EAGAIN, and man 2 io_submit says this is about the lack of available resources (in-kernel) to queue a new AIO block.

If this is really this error cause, you may want to increase the value given to libaio.AIOContext, and otherwise to either handle this exception or keep track of the number of in-flight AIO blocks to not exceed that value.

vpelletier commented 6 years ago

what kind of the bufferlist filed limit to? string?

Technically, anything ctypes.c_char.from_buffer is happy with.

On python2 it is at least bytearray and mmap. On python3 it is also memoryview of these (pretty sure it's a bug that this does not work on python2). I improved the docstring in 288529b2b8085459196e59009dab69c44336b668.

Going further on the API level, I'm now annoyed that I merged buffers and their usable lengths, especially when writing: one may have prepared large buffers but only filled a small part of these and would wish to only write that part as opposed to writing the whole buffer and truncating.

pahome commented 6 years ago

11 is EAGAIN, and man 2 io_submit says this is about the lack of available resources (in-kernel) to queue a new AIO block.

If this is really this error cause, you may want to increase the value given to libaio.AIOContext, and otherwise to either handle this exception or keep track of the number of in-flight AIO blocks to not exceed that value.

yes, it solved.

Going further on the API level, I'm now annoyed that I merged buffers and their usable lengths, especially when writing: one may have prepared large buffers but only filled a small part of these and would wish to only write that part as opposed to writing the whole buffer and truncating.

maybe doesn't do truncate and let user care about it?

Another question: Does API not support aio + fsync, right? I tried to modify and always failed

vpelletier commented 6 years ago

maybe doesn't do truncate and let user care about it?

This is what I meant. But I feel this is an annoying part of my API. FWIW, I wrote this API initially for USB gadget subsystem, as it exposes device endpoints as file objects which do not support select/poll/epoll, only AIO. And there, there is no way to truncate: data sent is sent to host computer.

Does API not support aio + fsync, right?

If you mean os.fsync(target_file) and/or os.fdatasync(target_file), I have no experience using these along with AIO.

If you mean IO_CMD_FSYNC and/or IO_CMD_FDSYNC, they are indeed not exposed in current code as I did not need them and did not find clear documentation while writing this wrapper.

vpelletier commented 6 years ago

If you mean IO_CMD_FSYNC and/or IO_CMD_FDSYNC

Actually, checking kernel code, I realise these are not even implemented in current torvalds master...

From my very short AIO experience, the interface is poorly documented, rarely used (checking the reverse dependencies of libaio on current Debian sid, I see a few databases, qemu, a fuse implementation of ZFS, and few IO benchrmark tools) and was likely developed and used initially off-tree (I remember reading about redhat-only extensions).

pahome commented 6 years ago

This is what I meant. But I feel this is an annoying part of my API. FWIW, I wrote this API initially for USB gadget subsystem, as it exposes device endpoints as file objects which do not support select/poll/epoll, only AIO. And there, there is no way to truncate: data sent is sent to host computer.

Is there anyway to solve this?

Actually, checking kernel code, I realise these are not even implemented in current torvalds master...

maybe the kernel doesn't support IO_CMD_FSYNC and/or IO_CMD_FDSYNC now so I can't use it.

vpelletier commented 6 years ago

Is there anyway to solve this?

By me having picked a different way to expose the API to python, yes. Or now, by breaking the API. So, technically possible but not very practical.

maybe the kernel doesn't support IO_CMD_FSYNC and/or IO_CMD_FDSYNC now so I can't use it.

This is exactly what "not in torvalds master" means: official kernel does not implement this. Maybe some custom flavours do (redhat ?) but not being available in torvalds master is not a very encouraging sign, and these should likely not be relied upon if you want a portable result (from distro to distro and kernel to kernel).

vpelletier commented 6 years ago

Latest libaio (0.3.111) added support for a separate set of flags, among which are the RWF_{,D}SYNC flag pair, which should replace the need of the read-hat-specific IO_CMD_F{,D}SYNC. I released python-libaio 0.3 which exposes this feature.

vpelletier / python-libaio

how to cowork with DIRECT_IO? #2