ikwzm / udmabuf

User space mappable dma buffer device driver for Linux.
BSD 2-Clause "Simplified" License
539 stars 165 forks source link

Unusual issue when mmapping udmabuf for network transfer #100

Closed DavidAntliff closed 1 year ago

DavidAntliff commented 1 year ago

Hi, apologies if this is a bit vague - I'm looking to find out how to debug this further, so I'll explain my issue in case you or anyone has an idea of where to look next.

I'm using u-dma-buf to manage a large (64MB) buffer as a DMA destination on an ARM64 (Zynq MPSoC) platform.

I have data being written into the buffer from the FPGA with no problems. This is all working correctly.

My problem is with network DMA out of the buffer. I have mmaped the /dev/udmabuf device. The problem is that sometimes the wrong amount of data is transferred (more than requested), and sometimes an internal error in boost::asio is triggered: system error 14: bad address. However this only happens with specific sizes of data from the memory-mapped udmabuf, such as 0xfff1 to 0xffff bytes. 0x10000 and greater is OK, until 0x1fff1 to 0x1ffff which are problematic again. There may be other bad ranges. None of this is random - it's 100% reproducible every time, and does not seem to depend on earlier state.

Specifically, I am using boost::asio::write like this (somewhat simplified). Note the use of a std::array to send the header, data and footer with a single call to write():

        auto length {0xfff8};   // a problematic length...
        std::string request_id {"cap00"};  // ... but only when this is 5 or more characters long!

        // see definition below
        const auto data = static_cast<const uint8_t *>memory_map("/dev/udmabuf1", 0x4000000, 0);

        std::string request_id = "cap0";
        header = "DATA " + request_id + " " + std::to_string(length) + "\n";

        error_code ec;
        auto bytes_transferred = io::write(
                socket_,
                std::array { io::buffer(header),
                             io::buffer(data, length),
                             io::buffer(footer) },
                ec);

        if (ec) {
            std::cerr << ec << std::endl;
        }

It seems certain combinations of length and request_id cause problems. For example, length 0xfff8 with request_id cap0 works correctly, but setting request_id to cap00 causes the Bad Address error. This might suggest something with the header string but I've eliminated that as a possibility - it is definitely related to the udmabuf mmap.

One thing I have discovered is that if I change my mmap to access the same memory but via /dev/mem (with appropriate physical address offset) rather than /dev/udmabuf1 then all of these issues completely disappear. I only have problems when mmapping /dev/udmabuf1.

Do you know of any settings on this device that may be affecting this situation? Do you have any suggestions what to look at next?

I am thinking about trying to reproduce this in QEMU, as I may be able to better debug the driver & kernel.

For reference, my memory_map function looks like this:

void * memory_map(const std::string &device, size_t length, off_t offset) {
    int fd;
    if ((fd = open(device.c_str(), O_RDWR | O_SYNC)) == -1) {
        std::perror("open");
        return nullptr;
    }

    auto mem = mmap(nullptr,
                    length,
                    PROT_READ | PROT_WRITE,
                    MAP_SHARED,
                    fd,
                    offset);

    if (mem == MAP_FAILED) {
        close(fd);
        std::perror("mmap");
        return nullptr;
    }
    return mem;
}

dmesg output when creating the buffer:

[88155.041021] u-dma-buf-mgr : create udmabuf1 67108864
[88155.059590] u-dma-buf udmabuf1: driver version = 4.0.0
[88155.059602] u-dma-buf udmabuf1: major number   = 242
[88155.059606] u-dma-buf udmabuf1: minor number   = 1
[88155.059612] u-dma-buf udmabuf1: phys address   = 0x0000000070200000
[88155.059617] u-dma-buf udmabuf1: buffer size    = 67108864
[88155.059623] u-dma-buf u-dma-buf.3.auto: driver installed.
DavidAntliff commented 1 year ago

If you find this bewildering, don't worry, you're not the only one!

That said, don't stress too much, I will try to create a minimal reproducible example in the next day or two, so that we can get closer to the cause.

Could any of the cache settings affect things in this weird manner?

ikwzm commented 1 year ago

Thank you for the issue.

The libc memcpy() or memset() is known to cause bus errors on the arm64 architecture when performing non-aligned accesses to non-cached areas. This may possibly be the cause.

https://developer.arm.com/documentation/ka004708/latest

DavidAntliff commented 1 year ago

@ikwzm thanks for your reply and the link.

What confuses me a little about this is that I always transfer data from the udmabuf memory-mapped region from offset 0, which is aligned to the allocated physical address of the udmabuf. It's only the length that matters - perhaps the size must be aligned too?

Is the reason that this works OK via /dev/mem because the access mode on that memory is Normal via that mechanism? I'll need to look at the devmem driver code I think. I'm afraid I don't know enough about Linux memory / caching (yet). (This is a "red herring" - it doesn't work via /dev/mem either, but the lengths that cause the issue change).

Would it work to (somehow) change the mode on the udmabuf memory after the DMA transfer into the buffer, but before the network transfer out of the buffer? Would that work? Or perhaps I need to change the default mode of the udmabuf right at the start? Would this have any benefit?

Or do you think it would be better to adjust my design so that all read access to the udmabuf memory are aligned in size?

DavidAntliff commented 1 year ago

Quick question: is a socket write() of 0xfff8 bytes from a memory-mapped udmabuf (phys_addr 0x6c200000), starting at offset 0, already aligned to 8-byte addresses?

EDIT (comment after solving): this turns out to be irrelevant, as the real concern is the alignment of the destination data address in the outgoing "packet".

DavidAntliff commented 1 year ago

With your hint about alignment, I believe I have solved the issue.

Recall:

        header = "DATA " + request_id + " " + std::to_string(length) + "\n";

and

                std::array { io::buffer(header),
                             io::buffer(data, length),
                             io::buffer(footer) },

From my findings, the length of the header part of this array needs to be such that the start of the data in the final packet layout is 64-bit aligned (the target of the memcpy or network DMA). This means that header needs to have a length that is a multiple of 8 bytes. By padding header to 8, 16, 24 etc. bytes, the destination address of the start of the next element (the buffer of mmapped udmabuf data) is also 8-byte aligned. This causes the issue to go away completely.

Thank you!

ikwzm commented 1 year ago

Congratulations on getting the issue resolved. Also, thank you for the valuable information.