Closed jmvezic closed 2 years ago
Update: this bug was introduced with version 3.4.0-20210617, and is present in all versions after that
Confirming I can reproduce this in 20210803. Hitting shift-H in top shows it's the dnsjava NIO selector
thread. Here's the stack trace (from jstack <pid>
):
"dnsjava NIO selector" #67 daemon prio=4 os_prio=0 cpu=1014221.96ms elapsed=1060.26s tid=0x00007fd000011800 nid=0x90e27 runnable [0x00007fd0e07bf000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPoll.wait(java.base@11.0.12/Native Method)
at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.12/EPollSelectorImpl.java:120)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.12/SelectorImpl.java:124)
- locked <0x00000000f41af7b8> (a sun.nio.ch.Util$2)
- locked <0x00000000f41af558> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(java.base@11.0.12/SelectorImpl.java:136)
at org.xbill.DNS.Client.runSelector(Client.java:67)
at org.xbill.DNS.Client$$Lambda$308/0x00000001004ec840.run(Unknown Source)
at java.lang.Thread.run(java.base@11.0.12/Thread.java:829)
Poking this with a debugger a bit it appears select returns immediately because the thread was interrupted. dnsjava's runSelector() code never clears the interrupted flag so it just busy loops calling select. Looks like dnsjava NIO selector ends up in in ToePool.getToes() which presumably means ToePool.shutdown() is interrupting it.
One workaround might be to have ToePool check the thread name and exclude it from interrupting.
As the dnsjava selector thread is global per process it seems wrong that it ends up in the ToePool thread group at all. So perhaps it'd be better to prevent it from being assigned to the group in the first place. I guess one way to do this would be to do a dummy lookup on startup from a thread that's not in a group.
Using the latest version (20210803) and a lot of versions before that, when the job is terminated, one CPU thread seems to be stuck at 100% doing nothing. This never goes away until I restart Heritrix.
For reference, this doesn't happen with version 20200304, for example. I haven't tried all versions, so I don't know when this problem started. There's also nothing in the logs that would indicate something is wrong.
Using default crawler-beans with a set operator URL and any seed you like.