tmds / Tmds.LinuxAsync

MIT License
25 stars 3 forks source link

Bad address #47

Closed adamsitnik closed 4 years ago

adamsitnik commented 4 years ago

@tmds with #46 now we get the underlying error which is "bad address"

--path "/json" --arg "-e=iouring" --arg "-t=1" --arg "-s=false" --arg "-r=false" --arg "-c=false" --arg "-a=true" --arg "-w=false" --connections 256                         
[STDERR] Process terminated. Tmds.LinuxAsync.PlatformException: Bad address
[STDERR]    at Tmds.LinuxAsync.PlatformException.Throw(Int32 errno) in /tmp/benchmarks-agent/benchmarks-server-6/knbb002p.iic/Tmds.LinuxAsync/src/Tmds.LinuxAsync/PlatformException.cs:line 33
[STDERR]    at Tmds.LinuxAsync.IOUringAsyncEngine.IOUringThread.<>c.<AddReadFromEventFd>b__31_0(AsyncOperationResult asyncResult, Object state, Int32 data) in /tmp/benchmarks-agent/benchmarks-server-6/knbb002p.iic/Tmds.LinuxAsync/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringThread.cs:line 233
[STDERR]    at Tmds.LinuxAsync.IOUringAsyncEngine.IOUringExecutionQueue.ExecuteCompletions() in /tmp/benchmarks-agent/benchmarks-server-6/knbb002p.iic/Tmds.LinuxAsync/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringExecutionQueue.cs:line 275
[STDERR]    at Tmds.LinuxAsync.IOUringAsyncEngine.IOUringThread.EventLoop() in /tmp/benchmarks-agent/benchmarks-server-6/knbb002p.iic/Tmds.LinuxAsync/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringThread.cs:line 78
[STDERR]    at System.Environment.FailFast(System.String)
[STDERR]    at Tmds.LinuxAsync.IOUringAsyncEngine+IOUringThread.EventLoop()
[STDERR]    at System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
[STDERR]    at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
[STDERR]    at System.Threading.ThreadHelper.ThreadStart()
tmds commented 4 years ago

"bad address"

Much more meaningful!

The addresses used here are: https://github.com/tmds/Tmds.LinuxAsync/blob/0fb0d4fdcab760892db3538002ce74e98cce0ac3/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringExecutionQueue.cs#L56 and https://github.com/tmds/Tmds.LinuxAsync/blob/0fb0d4fdcab760892db3538002ce74e98cce0ac3/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringThread.cs#L22 _dummyReadBuffer gets pinned during the read: https://github.com/tmds/Tmds.LinuxAsync/blob/0fb0d4fdcab760892db3538002ce74e98cce0ac3/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringExecutionQueue.cs#L174-L175 and released when io_uring completes the read: https://github.com/tmds/Tmds.LinuxAsync/blob/0fb0d4fdcab760892db3538002ce74e98cce0ac3/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringExecutionQueue.cs#L263-L264

tmds commented 4 years ago

This is probably the first operation that io_uring gets used for.

tmds commented 4 years ago

@adamsitnik let's smoke test io_uring on this machine. Can you try running unit tests from https://github.com/tkp1n/IoUring?

git clone https://github.com/tkp1n/IoUring.git
cd IoUring/tests/IoUring.Tests/
dotnet test
adamsitnik commented 4 years ago

@tmds I don't have the access (I simply did not ask for it). @antonfirsov is there any chance that you could ssh to this machine and run these tests?

tmds commented 4 years ago

@antonfirsov, feel free to debug this further when you look at it. We can tackle it together also.

antonfirsov commented 4 years ago

I will look into it. First, i will try to run them on another machine having the same configuration. If I can't reproduce the issue with that approach, will ask for permission from Sebastien to run them on aspnet-perf-lin.

antonfirsov commented 4 years ago

I sneaked into "asp-perf-lin" to run a quick test inside a 3.1 SDK container:

docker run -it --security-opt seccomp=unconfined --network host --rm mcr.microsoft.com/dotnet/core/sdk:3.1

Then I did run both tkp1n/IoUring unit tests and LinuxAsync/test/web, but unfortunately these conditions did not reproduce the issue.

After running:

dotnet run -c Release --urls http://127.0.0.1:23000 https://127.0.0.1:23001 -- -t 1 -s false -r false -c false -a true -w false

curl http://127.0.0.1:23000/json returns {"message":"Hello, World!"} as expected.

Now I will try to figure out how to properly test+benchmark this repo against 5.0. @adamsitnik I may need your help with that.

adamsitnik commented 4 years ago

I've removed the aligning logic (Marshal should always return aligned memory), replaced usage of pointers with span and added some "console logging" to my branch and tried to run the json benchmark using iouring.

Code: https://github.com/adamsitnik/Tmds.LinuxAsync/tree/badAddress

Sample output:

Scheduled read 8 bytes to 140382483824976
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976

Scheduled read 8 bytes to 140382483824976
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=489626271745
Removed: PollIn 0 0

Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976

Scheduled read 8 bytes to 140382483824976
Scheduled read 3949 bytes to 140386777555091
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976

Scheduled read 8 bytes to 140382483824976
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3949 140386777555091

Scheduled read 3802 bytes to 140386777555238
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3802 140386777555238

Scheduled read 3655 bytes to 140386777555385
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3655 140386777555385

Scheduled read 3508 bytes to 140386777555532
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3508 140386777555532

Scheduled read 3361 bytes to 140386777555679
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3361 140386777555679

Scheduled read 3214 bytes to 140386777555826
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3214 140386777555826

Scheduled read 3067 bytes to 140386777555973
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 3067 140386777555973

Scheduled read 2920 bytes to 140386777556120
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 2920 140386777556120

Scheduled read 2773 bytes to 140386777556267
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 2773 140386777556267

Scheduled read 2626 bytes to 140386777556414
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372543660916737
TryRead returned completion: result=147 userData=506806140929
Removed: Read 2626 140386777556414

Scheduled read 2479 bytes to 140386777556561
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=489626271745
Removed: PollIn 0 0

Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976

Scheduled read 8 bytes to 140382483824976
Scheduled read 3954 bytes to 140386777481358
Scheduled read 3954 bytes to 140386777485454
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976

Scheduled read 8 bytes to 140382483824976
Scheduled read 3954 bytes to 140386777493646
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372586610589697
TryRead returned completion: result=142 userData=549755813889

Scheduled read 3812 bytes to 140386777485596
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976
TryRead returned completion: result=1 userData=9223372590905556993
TryRead returned completion: result=142 userData=554050781185
Removed: Read 3954 140386777493646

Scheduled read 8 bytes to 140382483824976
Scheduled read 3812 bytes to 140386777493788
Scheduled read 3954 bytes to 140386777505934
Submit result: SubmittedSuccessfully
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=8 userData=197568495616
Removed: Read 8 140382483824976

Scheduled read 8 bytes to 140382483824976
Scheduled read 3954 bytes to 140386777510030
Submit result: SubmittedSuccessfully
TryRead returned completion: result=195 userData=9223372582315622401
TryRead returned completion: result=142 userData=545460846593
Removed: Read 3954 140386777481358
TryRead returned completion: result=1 userData=9223372234423271424
TryRead returned completion: result=-14 userData=197568495616
Removed: Read 8 140382483824976
[STDERR] Process terminated. Tmds.LinuxAsync.PlatformException: Bad address
[STDERR]    at Tmds.LinuxAsync.PlatformException.Throw(Int32 errno) in /tmp/benchmarks-agent/benchmarks-server-6/oqugrqci.4hl/Tmds.LinuxAsync/src/Tmds.LinuxAsync/PlatformException.cs:line 33
[STDERR]    at Tmds.LinuxAsync.IOUringAsyncEngine.IOUringThread.<>c.<AddReadFromEventFd>b__31_0(AsyncOperationResult asyncResult, Object state, Int32 data) in /tmp/benchmarks-agent/benchmarks-server-6/oqugrqci.4hl/Tmds.LinuxAsync/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringThread.cs:line 234
[STDERR]    at Tmds.LinuxAsync.IOUringAsyncEngine.IOUringExecutionQueue.ExecuteCompletions() in /tmp/benchmarks-agent/benchmarks-server-6/oqugrqci.4hl/Tmds.LinuxAsync/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringExecutionQueue.cs:line 297
[STDERR]    at Tmds.LinuxAsync.IOUringAsyncEngine.IOUringThread.EventLoop() in /tmp/benchmarks-agent/benchmarks-server-6/oqugrqci.4hl/Tmds.LinuxAsync/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringThread.cs:line 78
[STDERR]    at System.Environment.FailFast(System.String)
[STDERR]    at Tmds.LinuxAsync.IOUringAsyncEngine+IOUringThread.EventLoop()
[STDERR]    at System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
[STDERR]    at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
[STDERR]    at System.Threading.ThreadHelper.ThreadStart()

We are able to perform few 8 bytes reads to 140382483824976 memory location and after some time it fails with -14 .

@tmds does it ring any bell?

antonfirsov commented 4 years ago

Looks like it's not the first operation, as we assumed initially. I'll try to reproduce it locally with wrk or BenchmarksDriver.

tmds commented 4 years ago

Marshal should always return aligned memory

I think it might be aligned to the native machine size, so on 32-bit it could be 4 instead of 8.

@tmds does it ring any bell?

Next to the address of the read buffer, there is also the address of the iovec. Pre 5.5 kernels required the iovecs to be untouched until the operation completes. With 5.5, the iovecs can be reused after the operation is submitted. That happens here:

https://github.com/tmds/Tmds.LinuxAsync/blob/601c88372d14ef5b784d574ec741d3fb5df6d666/src/Tmds.LinuxAsync/IOUringAsyncEngine.IOUringExecutionQueue.cs#L237-L242

@antonfirsov I think your VM had a more recent 5.5 kernel than the asp-perf-lin machine? Can we install that on asp-perf-lin machine? io_uring had some issues where EFAULT was returned unexpectedly: https://github.com/axboe/liburing/issues?q=is%3Aissue+EFAULT+is%3Aclosed.

antonfirsov commented 4 years ago

Ok, I'll install 5.5.7 tomorrow.

tmds commented 4 years ago

@adamsitnik I had looked into the Spanification, but I didn't know how to do this (iovec*)Unsafe.AsPointer(ref iovs[iovIndex].

I think it might be aligned to the native machine size, so on 32-bit it could be 4 instead of 8.

Actually for iovec it is ok to align at 4-bytes on 32-bit. For some of the structs that are in AIO implementation, 8-byte alignment is needed even on 32-bit. I don't think AllocHGlobal will do 8-byte alignment on 32-bit.

antonfirsov commented 4 years ago

Ok, the patching is done, and it looks like the error did go away.

I've no idea know how I ended up installing 5.5.0 when there were newer versions available, sorry for the amateurish mistake :)

tmds commented 4 years ago

No problem. It's seems we're on the bleeding edge :smile:

adamsitnik commented 4 years ago

@antonfirsov big thanks for the update! now it works and I was able to run some benchmarks and share the results https://github.com/tmds/Tmds.LinuxAsync/issues/39#issuecomment-595364901