Closed paul-demo closed 3 years ago
I see this library is a thin wrapper around mmap; in that case perhaps the issue is that line 65 of mmio.py should specify O_DIRECT in addition to O_SYNC. What do you think? It is apparently not available on all platforms. I could check whether this actually makes a difference on the Zynq platform.
line 65: fd = os.open(path, os.O_RDWR | os.O_SYNC | getattr(os, "O_DIRECT", 0))
References: source: https://man7.org/linux/man-pages/man2/open.2.html https://docs.python.org/3/library/os.html#os.O_DIRECT
Interesting. We should try to isolate whether it's happening in Python (I'm a little suspicious of the slice assignments in mmio.py) or some problem with the mapping flags as you brought up. Would you mind trying a write with lua-periphery or c-periphery? Those implementations are essentially the same as the Python one, except the actual reads/writes are simple pointer dereferences.
I've confirmed with a logic analyzer that the write32 and read32 are in fact duplicated when using periphery and they are not duplicated when using devmem. I also tried O_DIRECT but that did not work and I got errors at the file-opening step due to O_DIRECT being an invalid argument. I tried it with O_RDWR always included, but with O_SYNC either included or not included (error in both cases). Here's the logic analyzer screenshots as proof if you want to see what an AXI transaction looks like:
Using devmem to write:
Using periphery to write32:
c-periphery appears to work as intended (no duplicate writes), so that suggests it's some problem with the python version or the mmap module you're using, and not some weirdness with the device itself. I just stripped down your test_mmio and compiled it natively:
#include "../src/mmio.h"
#define CONTROL_MODULE_BASE 0x80000000
#define PAGE_SIZE 4096
int main(void) {
mmio_t *mmio;
mmio = mmio_new();
mmio_open(mmio, CONTROL_MODULE_BASE, PAGE_SIZE);
mmio_write32(mmio, 0x0, 0x0);
mmio_close(mmio);
mmio_free(mmio);
return 0;
}
The version of Yocto [Zeus] that my build is based on includes v1.1.1 of python-periphery; but after reviewing whether that has any important differences compared to v2.2.0 that I get using pip3 on my desktop, I think it does not and they are basically the same implementation.
The underlying mmap usage also duplicates transactions, so maybe it's a lower level thing related to the fd open step.
This produces 2 writes:
mapping.seek(0); mapping.write(bytearray([0x0,]* 4))
This produces 4 writes:
mapping.seek(0); mapping.write(bytearray([0x0,]* 8))
Ok here's what works (as suggested here: https://stackoverflow.com/questions/53492716/python-writing-to-memory-in-a-single-operation). This seems like a bug in mmap, which I've reported here https://bugs.python.org/issue43131).
import os, mmap
fd = os.open("/dev/mem", os.O_RDWR | os.O_SYNC)
mapping = mmap.mmap(fd, 4096, flags=mmap.MAP_SHARED, prot=(mmap.PROT_READ | mmap.PROT_WRITE), offset=0x80000000)
mv = memoryview(mapping)
mv_as_int32 = mv.cast('I') # Note "I" works as intended, "L" still results in duplicate writes; "LL" is not an option
mv_as_int32[0x0 // 4] = 0xDEADBEEF # this works; results in 1 write issued on the AXI bus
I guess I will just not use python-periphery and instead use this lower-level workaround directly.
Thanks for testing and getting to the bottom of it @paul-demo. I'll have to migrate the MMIO module to use these more accurate writes. Unfortunately, it seems that the .cast()
method on memoryview()
isn't available in Python 2, so there will have to be another workaround there.
While you still have your ILA and environment still setup, would you mind checking if writes through the smaller casts, e.g. mv.cast('B')
and mv.cast('H')
, result in smaller single transfers on the AXI bus? e.g. with AWSIZE = 0 and 1, respectively.
Also, another approach I'm curious about testing is dereferencing the memoryview()
through a ctypes pointer, e.g.:
import ctypes
# mapping code as before, or mapping = bytearray(b"\x88\x77\x66\x55\x44\x33\x22\x11") for testing
mv = memoryview(mapping)
p = ctypes.c_uint32.from_buffer(mv, 0)
print(hex(p.value)) # -> 0x55667788
p.value = 0xaabbccdd
p = ctypes.c_uint16.from_buffer(mv, 0)
print(hex(p.value)) # -> 0xccdd
This approach can be made to work with Python 2 as well.
edit: fixing AWSIZE values for 1 and 2 byte transfers
Sure, I don't mind testing some different workarounds in the next day or so. If you could package up all the versions you want tested in a separate function for each, I can re-trigger the ILA and run each one and report the results that I see. You can just read/write any value to 0x80000000; that's a 32-bit GPIO that I used for my testing.
Thanks, I really appreciate it. Whenever you have a chance is fine. You've just got the perfect setup for this at the moment. Here is a test script with every combination (12 total -- 6 for memoryview cast, 6 for ctypes):
import os
import mmap
import ctypes
import sys
def mv_write32(mapping):
mv = memoryview(mapping)
p = mv.cast('I')
p[0] = 0xdeadbeef
def mv_write16(mapping):
mv = memoryview(mapping)
p = mv.cast('H')
p[0] = 0xbeef
def mv_write8(mapping):
mv = memoryview(mapping)
p = mv.cast('B')
p[0] = 0xaa
def mv_read32(mapping):
mv = memoryview(mapping)
p = mv.cast('I')
print(hex(p[0]))
def mv_read16(mapping):
mv = memoryview(mapping)
p = mv.cast('H')
print(hex(p[0]))
def mv_read8(mapping):
mv = memoryview(mapping)
p = mv.cast('B')
print(hex(p[0]))
def ctypes_write32(mapping):
p = ctypes.c_uint32.from_buffer(mapping, 0)
p.value = 0xdeadbeef
def ctypes_write16(mapping):
p = ctypes.c_uint16.from_buffer(mapping, 0)
p.value = 0xbeef
def ctypes_write8(mapping):
p = ctypes.c_uint8.from_buffer(mapping, 0)
p.value = 0xaa
def ctypes_read32(mapping):
p = ctypes.c_uint32.from_buffer(mapping, 0)
print(hex(p.value))
def ctypes_read16(mapping):
p = ctypes.c_uint16.from_buffer(mapping, 0)
print(hex(p.value))
def ctypes_read8(mapping):
p = ctypes.c_uint8.from_buffer(mapping, 0)
print(hex(p.value))
TESTS = [mv_write32, mv_write16, mv_write8, mv_read32, mv_read16, mv_read8,
ctypes_write32, ctypes_write16, ctypes_write8, ctypes_read32, ctypes_read16, ctypes_read8]
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: {} <test>\n".format(sys.argv[0]))
print("Available tests:\n " + "\n ".join([func.__name__ for func in TESTS]))
sys.exit(0)
fd = os.open("/dev/mem", os.O_RDWR | os.O_SYNC)
mapping = mmap.mmap(fd, 4096, flags=mmap.MAP_SHARED, prot=(mmap.PROT_READ | mmap.PROT_WRITE), offset=0x80000000)
globals()[sys.argv[1]](mapping)
If you could run the 6 mv and 6 ctypes ones under Python 3, and then 6 ctypes under Python 2 (the mv ones won't work under Python 2), that would be awesome. You can skip the mv write32 and read32 cases, as it appears you've verified those above, but I included them for completeness.
It looks like all 12 of those methods work in python3; I don't have python2 installed in this build so can't test those (python2 is no longer supported anyway; time to move on). I increased the ILA capture length and put them all in a loop, which also shows that you can do 12 transactions in about 30,000 cycles of the ~100MHz bus, or roughly 25 microseconds per read or write operation. I zoomed in and confirmed that each ready/valid handshake is in fact a single transaction on the bus.
for test in TESTS:
test(mapping)
0xdeadbeaa
0xbeaa
0xaa
0xdeadbeaa
0xbeaa
0xaa
Thanks a lot for testing @paul-demo! I've pushed a fix to the devel, which will be merged to master in the next release (and automatically close this issue).
I found on a Zynq device that python-periphery MMIO duplicated writes compared to using devmem. Any thoughts on why this could be? Does it have under the hood some kind of "retry" where it would perform the write twice? The device has a 64-bit ARM64 architecture (quad core A53 I believe).
https://forums.xilinx.com/t5/Processor-System-Design-and-AXI/axi-fifo-mm-s-AXI-FIFO-duplicates-words-requires-wrong-TX-length/m-p/1200351#M56397