Closed ThePumpingLemma closed 12 months ago
Hi @kohlschuetter, would it be possible to look into this issue?
Your commands to reproduce may not work on all platforms (e.g., Alpine Linux) and may return the wrong process ("grep" itself on macOS).
I'm trying to verify with
mvn clean compile exec:java -Dexec.mainClass="org.example.Main"
and
sudo apk add lsof
lsof -p $(jps -v | grep jetty-concurrency | cut -f1 -d' ') | grep junix-ingress
The commit you recommend to reproduce does not work for me. When I manually remove the monkey-patched files, everything still works OK on macOS, but I see connection errors on Alpine Linux. The monkey patched code (assuming it's the same as in the PR), indeed seems to fix that bug, but I'm not seeing the timeout error messages.
The commit you recommend to reproduce does not work for me. When I manually remove the monkey-patched files, everything still works OK on macOS, but I see connection errors on Alpine Linux. The monkey patched code (assuming it's the same as in the PR), indeed seems to fix that bug, but I'm not seeing the timeout error messages.
I didn't test on Alpine; @kevin-sq did reproduce on Linux?
Not sure if macOS version matters in this case:
Darwin redacted.local 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct 9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000 arm64
@ThePumpingLemma :
lsof -p $(ps aux | grep "org.example.Main" | awk {'print$2'} | head -n 1) | grep junix-ingress.sock | wc -l
may actually get the PID of the grep
command ,which the jps
approach prevents. You could also add another grep for "java", and remove the head -n 1
to raise a red flag should there still be more than one line:
lsof -p $(ps aux | grep "org.example.Main" | grep java | awk {'print$2'}) | grep junix-ingress.sock | wc -l
Also beware that lsof
on Alpine is busybox-based, so it will actually show all PIDs unless we install the proper binary via sudo apk add lsof
I've merged the PR, however I'm not really happy that we don't have a proper test case. However, I can see these errors in curl go away with the fix.
I'm able to reproduce the leaking junix file descriptor in linux via multipass against this baseline branch (which is the same as the above 575469b3453e7b09d9d60f6460cdab8117b8e773 but removed debug logging and other monkey patch files).
Linux info:
Welcome to Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-87-generic aarch64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Mon Nov 6 09:45:35 PST 2023
System load: 0.1953125
Usage of /: 10.9% of 24.05GB
Memory usage: 8%
Swap usage: 0%
Processes: 127
Users logged in: 1
In one terminal started monitoring the fd count of junix-ingress.sock
while sleep 1; do echo $(lsof -p $(ps aux | grep "org.example.Main" | awk {'print$2'} | head -n 1) | grep junix-ingress.sock | wc -l); done
In another terminal, started the server against baseline branch
mvn compile exec:java -Dexec.mainClass="org.example.Main"
In another terminal started calling the junix uds:
seq 1 300000 | xargs -P1500 -I{} curl --unix-socket /tmp/junix-ingress.sock http://localhost/process
I could verify the monitor count started climbing but not for the native endpoint. Screenshots of the above process:
Server started with monitoring. Junix uds not being called yet:
Started calling junix uds
Running against native endpoint does not leak
Repeating the above with the monkey patched fix (baseline branch patched) results in junix behaving the same as native.
Can you please verify with the latest 2.8.3-SNAPSHOT, and all monkey-patched code removed?
Make sure the following section is in your POM:
<repositories>
<repository>
<id>snapshots-repo</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
I can verify the fix works against the snapshot with all of the monkey-patched code removed: baseline branch 2.8.3-SNAPSHOT
I've merged the PR, however I'm not really happy that we don't have a proper test case. However, I can see these errors in curl go away with the fix.
It's definitely possible to stress test this, so we could make integration tests to that effect.
However, I don't see why we couldn't write the code in such a way that these routines are unit testable; then we could mock selection keys and see how the cleanup code handles it. I could take a pass at a refactor to facilitate with unit testing if you would like?
Yes that would be great, thank you!
@kohlschuetter I think I want to approach it by first making some broader fixes before I break it down into unit testable functions. The first thing I want to attack is locking, where I noticed some inconsistencies in the way locks are managed. Here is a PR where I do some renaming followed by lock normalization: https://github.com/kohlschutter/junixsocket/pull/146
I would like to follow this up with reworking how PollFd
objects are managed; I think it's a bit hard to follow and I also think there is redundant code in initPollFd
(which is the code we need to unit test).
Before we continue with improvements to AFSelector, please verify that the updated patch (5e8a6e6) resolves all concurrency issues / FileDescriptor leaks for you.
I reverted the original patch attempts because we really introduced the regression in 2.8.2, and even with the latest changes, a TIPC test would fail on Linux.
Please also verify by running the selftest jar on Linux with the TIPC kernel module loaded, and a bearer enabled:
sudo modprobe tipc
sudo tipc bearer enable media eth device eth0
java -ea -jar junixsocket-selftest-2.8.3-SNAPSHOT-jar-with-dependencies.jar
Before we continue with improvements to AFSelector, please verify that the updated patch (5e8a6e6) resolves all concurrency issues / FileDescriptor leaks for you.
It seems to be working okay on macOS from a preliminary test.
However, this patch doesn't implement the locking in the way that the documentation suggests we should:
A selector's selected-key set is not, in general, safe for use by multiple concurrent threads. If such a thread might modify the set directly then access should be controlled by synchronizing on the set itself. The iterators returned by the set's {@link java.util.Set#iterator() iterator} methods are fail-fast: If the set is modified after the iterator is created, in any way except by invoking the iterator's own {@link java.util.Iterator#remove() remove} method, then a {@link java.util.ConcurrentModificationException} will be thrown.
IMO, I think it would make sense to implement things as similar to Java's standard library as possible, but I'll leave that up to you.
I don't have a Linux test setup handy, maybe @kevink-sq can try to verify the change?
Yeah, it's really odd. When you replace that ConcurrentHashMap with HashMap, jetty throws a ConcurrentModificationException, which means that either two threads are accessing the structure concurrently, or modifications happen after the iterator has been created. Maybe there's something else we're missing?
So I think it's the .clear() that's throwing off jetty. In the original SelectorImpl, they check if the key already exists (processReadyEvents
):
if (selectedKeys.contains(ski)) {
if (ski.translateAndUpdateReadyOps(rOps)) {
return 1;
}
} else {
ski.translateAndSetReadyOps(rOps);
if ((ski.nioReadyOps() & ski.nioInterestOps()) != 0) {
selectedKeys.add(ski);
return 1;
}
}
@ThePumpingLemma I was able to remove the second ConcurrentHashMap with 38cda27. Can you please take another look?
(Fixed checkstyle warning)
@kohlschuetter are the latest changes (including 38cda270d3aa7a8d3b3cb1803b2c02e1d33c9101) on 2.8.3-SNAPSHOT
?
@kevink-sq Uploaded just now! 20231109.194633-3
Be sure to run mvn
with -U
to update snapshots.
I reviewed the changes and have some thoughts:
this
and selectedKeysPublic
. My preference would be to keep the locking + data structures simple and try to mirror the reference implementation (i.e. Java standard library) to minimize surprises.MapValueSet
could be thread-safe, but I'm not convinced without doing a really thorough analysis (I looked pretty closely at the iterator, but still need to chew on the rest). In contrast, it is really easy to look at synchronized blocks and convince yourself that all of the state mutation is under a lock and that there are no races.MapValueSet
is designed to be very generic (e.g. removedSentinel
is optional), but I'm not sure it's worth the additional complexity. If you decide to stick with this approach, consider simplifying where possible.All in all, I think it is possible to prove your approach safe, but I'm not personally comfortable with it.
MapValueSet wraps the ConcurrentHashMap, so it is as safe as that, and may throw the ConcurrentModificationException only in certain narrow circumstances with iterator()
, which is in line with the AFSelector documentation (selectedKeys()
sets don't have to be thread-safe). I actually wanted to build this a long time ago, since having a single map for all keys simplifies things quite a bit, so I'm very comfortable with it.
Once we had the ConcurrentHashMap for registered keys, it was clear that simply ignoring the value was a waste. Adding another ConcurrentHashMap resulted in simpler code but higher memory requirements. I'm happy that we got that out of the way.
As a next step, could you take a look at creating some unit tests around the selected keys behavior? Once we have good test coverage, we can make more confident optimizations without introducing further regressions.
Thanks for your help, @ThePumpingLemma!
Reran the above tests against junixsocket-jetty-2.8.3-20231109.194633-3.jar
artifact and the concurrency issue and leaking file descriptors are still fixed.
I'm also for @ThePumpingLemma's approach to mirroring the reference implementation in the Java standard library. Things like the concurrency issue and the file descriptor leaks were resolved from following their examples and straying from them with novel implementations may have setbacks simply because they haven't been battle tested as much. Unless there are clear benefits to the approach, I'm not sure what is gained.
Furthermore, JEP-380 implements kqueue/epoll under the hood which gives a performance edge over junix. If junix were to follow suit, I think it'd be easier to make the changes if references were taken from the Java standard library beforehand.
Once Java supports all sorts of obscure socket types like AF_UNIX, AF_TIPC, AF_VSOCK, etc., there will probably be little need for junixsocket, which would actually be a good thing.
Until then, junixsocket remains a standalone project, under a different license, so copying code from the standard library just isn't an easy option.
Moreover, unlike commercial endeavors like Square/Cash app, this is my personal project. I'm not being paid to develop it, which means I can deliberately spend some extra time on polishing bits that I would otherwise leave as-is.
I hear your concerns. I don't want to sound harsh. However, please accept my explanation, and decision.
In this instance, removing the second Set is, in my view, the right thing to do, as it removes unnecessary allocation operations in a hot path (polling sockets). It also obviously fixes the issue you initially reported, without introducing any obvious regressions (unlike the earlier attempts).
Since I see you care about improving junixsocket, I hope you continue to contribute by filing tickets and PRs. Abstract, high-level critique as in "not battle tested" or "what about kqueue/epoll" is not very constructive for this present ticket. Please be very specific with your concerns, and, ideally, provide solutions for these concerns. As I said, I'm not being paid for this project, so I have to manage my attention wisely.
If you find that junixsocket is slower/worse than some other approach, please file a bug with some data/code, ideally with a PR that you accept may end up not the way it started.
As I explained earlier, the best way to contribute to the very specific situation of AFSelector
is to improve the unit test coverage around selectors. When we have good coverage, we can look into taking more inspiration from the standard library if and only if there is a measurable benefit. With good test coverage, the conversation automatically becomes more detail-oriented, more data-informed, and less opinion-based.
You're very welcome to use JEP-380, just as you're welcome to help improve junixsocket. For what it's worth, I'm seeing higher throughput with junixsocket, compared to JEP-380, and I hope you will find too that junixsocket is the better choice, at least for the foreseeable future.
2.8.3 has just been released, including this fix. Please verify and re-open this ticket if necessary. Thanks again for reporting and working with me on this issue.
Describe the bug With our PoC at https://github.com/kevink-sq/jetty-concurrency-issue-poc/tree/575469b3453e7b09d9d60f6460cdab8117b8e773, we noticed that there is unbounded growth of file descriptors when running our simple load test:
Leading to timeouts from curl and timeout exceptions in jetty:
To Reproduce Steps to reproduce the behavior (assuming macOS):
mvn compile exec:java -Dexec.mainClass="org.example.Main"
seq 1 15000 | xargs -P1500 -I{} curl --unix-socket ./junix-ingress.sock http://localhost/process
Expected behavior All curl calls should succeed and there should be no timeouts in jetty.
Output/Screenshots N/A
Please make sure to test the problem still occurs on the latest version of junixsocket Tested on v2.8.2
Notes Proposed fix: https://github.com/kohlschutter/junixsocket/pull/144