Closed tenghuanhe closed 3 years ago
Hi, @vigna could you please have a look? Thanks very much!!
Check for concurrent access. The maps are fine.
Please describe how to reproduce the issue on the current version of fastutil.
Check for concurrent access. The maps are fine.
Thanks for reply, I'll have a check.
Please describe how to reproduce the issue on the current version of fastutil.
Thanks for reply @incaseoftrouble, It's hard to reproduce this issue manually, but it occurs every several days in our prod env. I'll attach the code later.
Hi, @vigna @incaseoftrouble Here is the code snippet where Object2IntOpenHashMap is used, as well as NPE occured. https://github.com/apache/druid/blob/c2fca2744a5427a823f6521ccdba2d1548ee85d9/processing/src/main/java/io/druid/segment/StringDimensionIndexer.java#L101-L116
The map valueToId
is well synchronized and no other places written into.
public int add(String originalValue)
{
String value = Strings.nullToEmpty(originalValue);
synchronized (lock) {
int prev = valueToId.getInt(value);
if (prev >= 0) {
return prev;
}
final int index = size();
valueToId.put(value, index);
idToValue.add(value);
minValue = minValue == null || minValue.compareTo(value) > 0 ? value : minValue;
maxValue = maxValue == null || maxValue.compareTo(value) < 0 ? value : maxValue;
return index;
}
}
Detailed stack:
java.lang.NullPointerException
at it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap.rehash(Object2IntOpenHashMap.java:1313) ~[fastutil-8.1.0.jar:?]
at it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap.insert(Object2IntOpenHashMap.java:322) ~[fastutil-8.1.0.jar:?]
at it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap.put(Object2IntOpenHashMap.java:331) ~[fastutil-8.1.0.jar:?]
at io.druid.segment.StringDimensionIndexer$DimensionDictionary.add(StringDimensionIndexer.java:110) ~[druid-processing-0.12.3-iap7-failover.jar:0.12.3-iap7]
at io.druid.segment.StringDimensionIndexer.processRowValsToUnsortedEncodedKeyComponent(StringDimensionIndexer.java:234) ~[druid-processing-0.12.3-iap7-failover.jar:0.12.3-iap7]
at io.druid.segment.StringDimensionIndexer.processRowValsToUnsortedEncodedKeyComponent(StringDimensionIndexer.java:59) ~[druid-processing-0.12.3-iap7-failover.jar:0.12.3-iap7]
at io.druid.segment.incremental.IncrementalIndex.toTimeAndDims(IncrementalIndex.java:632) ~[druid-processing-0.12.3-iap7-failover.jar:0.12.3-iap7]
at io.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:572) ~[druid-processing-0.12.3-iap7-failover.jar:0.12.3-iap7]
at io.druid.segment.realtime.plumber.Sink.add(Sink.java:158) ~[druid-server-0.12.3-iap7.jar:0.12.3-iap7]
at io.druid.segment.realtime.plumber.RealtimePlumber.add(RealtimePlumber.java:228) ~[druid-server-0.12.3-iap7.jar:0.12.3-iap7]
at io.druid.segment.realtime.plumber.Plumbers.addNextRow(Plumbers.java:69) ~[druid-server-0.12.3-iap7.jar:0.12.3-iap7]
at io.druid.indexing.common.task.RealtimeIndexTask.run(RealtimeIndexTask.java:397) ~[druid-indexing-service-0.12.3-iap7.jar:0.12.3-iap7]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:457) [druid-indexing-service-0.12.3-iap7.jar:0.12.3-iap7]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:429) [druid-indexing-service-0.12.3-iap7.jar:0.12.3-iap7]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Do you have any ideas? Thanks again.
It's hard to reproduce this issue manually, but it occurs every several days
this smells a lot like concurrency issues, but the code really seems to be fine in that regard.
The objects you put in it are immutable (strings / ints), you never put nulls in there, I can't really see what is going wrong. For reference, the 8.1.0 rehash code is:
protected void rehash(final int newN) {
final K key[] = this.key;
final int value[] = this.value;
final int mask = newN - 1;
final K newKey[] = (K[]) new Object[newN + 1];
final int newValue[] = new int[newN + 1];
int i = n, pos;
for (int j = realSize(); j-- != 0;) {
while (((key[--i]) == null));
// NPE happens in this line
if (!((newKey[pos = (it.unimi.dsi.fastutil.HashCommon.mix((key[i]).hashCode())) & mask]) == null))
while (!((newKey[pos = (pos + 1) & mask]) == null));
newKey[pos] = key[i];
newValue[pos] = value[i];
}
newValue[newN] = value[n];
n = newN;
this.mask = mask;
maxFill = maxFill(n, f);
this.key = newKey;
this.value = newValue;
}
and is effectively the same as the 8.5.6 code.
I can't even see how a NPE can possibly occur there except if there were concurrent access (key[i]
and newKey
have to be non-null at that line except if key
is modified concurrently). But the reference to the map (valueToId
) is not being leaked and always guarded by a lock.
I can't even see how a NPE can possibly occur there except if there were concurrent access (key[i] and newKey have to be non-null at that line except if key is modified concurrently). But the reference to the map (valueToId) is not being leaked and always guarded by a lock.
@incaseoftrouble thanks for your quick reply. Exactly!! I am also very confused, how could this happen 😕😕?
How many entries are in your map? Can you share the sequence of keys that is being used?
OK. I see this old version has a significant problem—null keys will cause an exception. The current version has a separate test for null keys. I'd suggest trying that first. Still, it is not clear to me why there would be null keys in the map.
@szarnekow Thank you for your reply.
How many entries are in your map?
The map size can be 200k or more.
Can you share the sequence of keys that is being used?
Sorry but the keys were not dumped when NPE occurred.
@vigna @incaseoftrouble Speaking of the keys being used, there is another confusing thing we encountered.
If taking a closer look at the code snippet above, we can find the variable List<String> idToValue
will never have duplicate values in it, but it did happen, at the same time the content of map valueToId
is even more wired. As we can see valueToId
is a String -> Integer
map, but we found keys of type byte[]
, char[]
, int[]
, java.io.DataOutputStream
and others when opening the heap dump file with MAT(Memory Analyzer Tool) as shown below (Sorry that the value of Strings in the figure is obscured for reasons of confidentiality).
From the logs, in March 2017 there was a major code cleanup. That was the moment in which the bug was fixed. You should definitely use a more recent version.
Well, you definitely have to look for concurrent access. There no way the insertion in that piece of code can put in keys of the wrong type. So upgrading will not solve your problem.
@tenghuanhe This really seems like an issue on your side. Possible culprits might be actual hardware problems, faulty JVM / memory management on the OS level, or reflection being used (or some other things which bypass visibility / type safety). These objects cannot be in there simply by type safety guarantees of the JVM. The only way how references get into the map is through the add method and this is typed as String.
Are you maybe using JNI / native calls or JVM agents? These can also mess with memory I think.
Since the maps are so large, it could be a memory issue on OS / JVM level. Which JVM are you using?
Well, you definitely have to look for concurrent access. There no way the insertion in that piece of code can put in keys of the wrong type. So upgrading will not solve your problem.
@vigna thank you I'll have a check on concurrent access again and consider upgrading if necessary.
@incaseoftrouble Thanks again for you kindness and reply.
Are you maybe using JNI / native calls or JVM agents? These can also mess with memory I think.
I am not sure whether JNI/native calls are used and will have a check. JVM agents are not used.
The problem started to appear after a feature using file MMAP added to the codebase (first the data is written to disk file using MMAP, then reading from the disk file using MMAP and add the data to the Object2IntOpenHashMap
), is that possible to mess the memory? or have you encountered a similar situation before?
Since the maps are so large, it could be a memory issue on OS / JVM level. Which JVM are you using?
The JVM and kernel version:
~$ java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
~$ uname -r
4.19.95-13
MMAP definitely sounds like something that can mess with memory. I guess you are using Java's memory mapped buffers though? These should be fine.
Maybe you are (de)serializing objects in this memory mapping and this causes issues (objects of the wrong type are being created). Maybe add some type safety assertions along the way from the mmap reading to the add
call (assert foo instanceof String
). But not sure if something could actually go wrong in this way.
I think there are instrumentation frameworks what will let you add asserts at load time. You could use one of those to put type asserts in all put methods. But mmap sounds like the source of your problems.
I guess you are using Java's memory mapped buffers though?
@incaseoftrouble yes, we are using java.nio.MappedByteBuffer
like this
FileChannel channel = FileChannel.open(currentWriteFile.toPath(), StandardOpenOption.READ, StandardOpenOption.WRITE);
MappedByteBuffer mappedByteBuffer = channel.map(FileChannel.MapMode.READ_WRITE, 0, fileSize);
I think there are instrumentation frameworks what will let you add asserts at load time.
Thanks @vigna I'll search for it.
But mmap sounds like the source of your problems.
Agreed, I am going to check the mmap usage in our code.
Since it's not a Object2IntOpenHashMap
problem, I think it's okay to close this issue?
Thanks again for your help @vigna @incaseoftrouble.
Hi, NPE while putting into Object2IntOpenHashMap, could anyone help?