Closed adamsitnik closed 4 years ago
"bad address"
Much more meaningful!
The addresses used here are:
https://github.com/tmds/Tmds.LinuxAsync/blob/0fb0d4fdcab760892db3538002ce74e98cce0ac3/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringExecutionQueue.cs#L56
and
https://github.com/tmds/Tmds.LinuxAsync/blob/0fb0d4fdcab760892db3538002ce74e98cce0ac3/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringThread.cs#L22
_dummyReadBuffer
gets pinned during the read:
https://github.com/tmds/Tmds.LinuxAsync/blob/0fb0d4fdcab760892db3538002ce74e98cce0ac3/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringExecutionQueue.cs#L174-L175
and released when io_uring completes the read:
https://github.com/tmds/Tmds.LinuxAsync/blob/0fb0d4fdcab760892db3538002ce74e98cce0ac3/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringExecutionQueue.cs#L263-L264
This is probably the first operation that io_uring gets used for.
@adamsitnik let's smoke test io_uring on this machine. Can you try running unit tests from https://github.com/tkp1n/IoUring?
git clone https://github.com/tkp1n/IoUring.git
cd IoUring/tests/IoUring.Tests/
dotnet test
@tmds I don't have the access (I simply did not ask for it). @antonfirsov is there any chance that you could ssh to this machine and run these tests?
@antonfirsov, feel free to debug this further when you look at it. We can tackle it together also.
I will look into it. First, i will try to run them on another machine having the same configuration. If I can't reproduce the issue with that approach, will ask for permission from Sebastien to run them on aspnet-perf-lin.
I sneaked into "asp-perf-lin" to run a quick test inside a 3.1 SDK container:
docker run -it --security-opt seccomp=unconfined --network host --rm mcr.microsoft.com/dotnet/core/sdk:3.1
Then I did run both tkp1n/IoUring unit tests and LinuxAsync/test/web, but unfortunately these conditions did not reproduce the issue.
After running:
dotnet run -c Release --urls http://127.0.0.1:23000 https://127.0.0.1:23001 -- -t 1 -s false -r false -c false -a true -w false
curl http://127.0.0.1:23000/json
returns {"message":"Hello, World!"}
as expected.
Now I will try to figure out how to properly test+benchmark this repo against 5.0. @adamsitnik I may need your help with that.
I've removed the aligning logic (Marshal should always return aligned memory), replaced usage of pointers with span and added some "console logging" to my branch and tried to run the json benchmark using iouring.
Code: https://github.com/adamsitnik/Tmds.LinuxAsync/tree/badAddress
Sample output:
Scheduled read 8 bytes to 140382483824976
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976
Scheduled read 8 bytes to 140382483824976
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=489626271745
Removed: PollIn 0 0
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976
Scheduled read 8 bytes to 140382483824976
Scheduled read 3949 bytes to 140386777555091
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976
Scheduled read 8 bytes to 140382483824976
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3949 140386777555091
Scheduled read 3802 bytes to 140386777555238
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3802 140386777555238
Scheduled read 3655 bytes to 140386777555385
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3655 140386777555385
Scheduled read 3508 bytes to 140386777555532
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3508 140386777555532
Scheduled read 3361 bytes to 140386777555679
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3361 140386777555679
Scheduled read 3214 bytes to 140386777555826
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3214 140386777555826
Scheduled read 3067 bytes to 140386777555973
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3067 140386777555973
Scheduled read 2920 bytes to 140386777556120
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 2920 140386777556120
Scheduled read 2773 bytes to 140386777556267
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 2773 140386777556267
Scheduled read 2626 bytes to 140386777556414
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 2626 140386777556414
Scheduled read 2479 bytes to 140386777556561
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=489626271745
Removed: PollIn 0 0
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976
Scheduled read 8 bytes to 140382483824976
Scheduled read 3954 bytes to 140386777481358
Scheduled read 3954 bytes to 140386777485454
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976
Scheduled read 8 bytes to 140382483824976
Scheduled read 3954 bytes to 140386777493646
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372586610589697
TryRead returned completion: result=142 userData=549755813889
Scheduled read 3812 bytes to 140386777485596
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976
TryRead returned completion: result=1 userData=9223372590905556993
TryRead returned completion: result=142 userData=554050781185
Removed: Read 3954 140386777493646
Scheduled read 8 bytes to 140382483824976
Scheduled read 3812 bytes to 140386777493788
Scheduled read 3954 bytes to 140386777505934
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976
Scheduled read 8 bytes to 140382483824976
Scheduled read 3954 bytes to 140386777510030
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372582315622401
TryRead returned completion: result=142 userData=545460846593
Removed: Read 3954 140386777481358
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=-14 userData=197568495616
Removed: Read 8 140382483824976
[STDERR] Process terminated. Tmds.LinuxAsync.PlatformException: Bad address
[STDERR] at Tmds.LinuxAsync.PlatformException.Throw(Int32 errno) in /tmp/benchmarks-agent/benchmarks-server-6/oqugrqci.4hl/Tmds.LinuxAsync/src/Tmds.LinuxAsync/PlatformException.cs:line 33
[STDERR] at Tmds.LinuxAsync.IOUringAsyncEngine.IOUringThread.<>c.<AddReadFromEventFd>b__31_0(AsyncOperationResult asyncResult, Object state, Int32 data) in /tmp/benchmarks-agent/benchmarks-server-6/oqugrqci.4hl/Tmds.LinuxAsync/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringThread.cs:line 234
[STDERR] at Tmds.LinuxAsync.IOUringAsyncEngine.IOUringExecutionQueue.ExecuteCompletions() in /tmp/benchmarks-agent/benchmarks-server-6/oqugrqci.4hl/Tmds.LinuxAsync/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringExecutionQueue.cs:line 297
[STDERR] at Tmds.LinuxAsync.IOUringAsyncEngine.IOUringThread.EventLoop() in /tmp/benchmarks-agent/benchmarks-server-6/oqugrqci.4hl/Tmds.LinuxAsync/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringThread.cs:line 78
[STDERR] at System.Environment.FailFast(System.String)
[STDERR] at Tmds.LinuxAsync.IOUringAsyncEngine+IOUringThread.EventLoop()
[STDERR] at System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
[STDERR] at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
[STDERR] at System.Threading.ThreadHelper.ThreadStart()
We are able to perform few 8 bytes reads to 140382483824976
memory location and after some time it fails with -14
.
@tmds does it ring any bell?
Looks like it's not the first operation, as we assumed initially. I'll try to reproduce it locally with wrk or BenchmarksDriver.
Marshal should always return aligned memory
I think it might be aligned to the native machine size, so on 32-bit it could be 4 instead of 8.
@tmds does it ring any bell?
Next to the address of the read buffer, there is also the address of the iovec. Pre 5.5 kernels required the iovecs to be untouched until the operation completes. With 5.5, the iovecs can be reused after the operation is submitted. That happens here:
@antonfirsov I think your VM had a more recent 5.5 kernel than the asp-perf-lin
machine? Can we install that on asp-perf-lin
machine? io_uring had some issues where EFAULT was returned unexpectedly: https://github.com/axboe/liburing/issues?q=is%3Aissue+EFAULT+is%3Aclosed.
Ok, I'll install 5.5.7 tomorrow.
@adamsitnik I had looked into the Spanification, but I didn't know how to do this (iovec*)Unsafe.AsPointer(ref iovs[iovIndex]
.
I think it might be aligned to the native machine size, so on 32-bit it could be 4 instead of 8.
Actually for iovec
it is ok to align at 4-bytes on 32-bit. For some of the structs that are in AIO implementation, 8-byte alignment is needed even on 32-bit. I don't think AllocHGlobal
will do 8-byte alignment on 32-bit.
Ok, the patching is done, and it looks like the error did go away.
I've no idea know how I ended up installing 5.5.0 when there were newer versions available, sorry for the amateurish mistake :)
No problem. It's seems we're on the bleeding edge :smile:
@antonfirsov big thanks for the update! now it works and I was able to run some benchmarks and share the results https://github.com/tmds/Tmds.LinuxAsync/issues/39#issuecomment-595364901
@tmds with #46 now we get the underlying error which is "bad address"