qubole / rubix

Cache File System optimized for columnar formats and object stores
Apache License 2.0
182 stars 74 forks source link

Hive queries are failing in Hadoop cluster with an Exception #175

Closed abhishekdas99 closed 6 years ago

abhishekdas99 commented 6 years ago

Hive queries are failing with the following exception.

Caused by: com.google.common.shaded.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: No-args constructor for class com.qubole.rubix.hadoop2.Hadoop2ClusterManagerUtil$NodesResponse does not exist. Register an InstanceCreator with Gson for this type to fix this problem.

at com.google.common.shaded.cache.LocalCache$Segment.get(LocalCache.java:2207)

at com.google.common.shaded.cache.LocalCache.get(LocalCache.java:3953)

at com.google.common.shaded.cache.LocalCache.getOrLoad(LocalCache.java:3957)

at com.google.common.shaded.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)

at com.qubole.rubix.hadoop2.Hadoop2ClusterManager.isMaster(Hadoop2ClusterManager.java:112)

at com.qubole.rubix.core.CachingFileSystem.getFileBlockLocations(CachingFileSystem.java:283)

at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1749)

at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1734)

at org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:657)

at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:361)

at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:673)

at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:659)

... 4 more

Caused by: java.lang.RuntimeException: No-args constructor for class com.qubole.rubix.hadoop2.Hadoop2ClusterManagerUtil$NodesResponse does not exist. Register an InstanceCreator with Gson for this type to fix this problem.

at com.google.gson.MappedObjectConstructor.constructWithNoArgConstructor(MappedObjectConstructor.java:64)

at com.google.gson.MappedObjectConstructor.construct(MappedObjectConstructor.java:53)

at com.google.gson.JsonObjectDeserializationVisitor.constructTarget(JsonObjectDeserializationVisitor.java:41)

at com.google.gson.JsonDeserializationVisitor.getTarget(JsonDeserializationVisitor.java:54)

at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:98)

at com.google.gson.JsonDeserializationContextDefault.fromJsonObject(JsonDeserializationContextDefault.java:73)

at com.google.gson.JsonDeserializationContextDefault.deserialize(JsonDeserializationContextDefault.java:49)

at com.google.gson.Gson.fromJson(Gson.java:379)

at com.google.gson.Gson.fromJson(Gson.java:329)

at com.qubole.rubix.hadoop2.Hadoop2ClusterManagerUtil.getAllNodes(Hadoop2ClusterManagerUtil.java:83)

at com.qubole.rubix.hadoop2.Hadoop2ClusterManager$1.load(Hadoop2ClusterManager.java:68)

at com.qubole.rubix.hadoop2.Hadoop2ClusterManager$1.load(Hadoop2ClusterManager.java:57)

at com.google.common.shaded.cache.CacheLoader$1.load(CacheLoader.java:185)

at com.google.common.shaded.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)

at com.google.common.shaded.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)

at com.google.common.shaded.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)

at com.google.common.shaded.cache.LocalCache$Segment.get(LocalCache.java:2201)
abhishekdas99 commented 6 years ago

This is a regression caused by 3b2aacbd409cd0b87c6471d11c51f2c8fdd72108 where the static inner classes NodeResponse, Nodes, Node are moved from the clustermanager impl class to an utility class

vrajat commented 6 years ago

I looked at the PR and fix looks fine. Can you give more information on what was missed and how we can avoid it in the future ? Is it more unit tests or a different test framework that is not available right now ?

abhishekdas99 commented 6 years ago

I was surprised to see the UT was not able to catch this even though we hit the same code. We need to investigate further why the UT was unable to report this exception.

vrajat commented 6 years ago

Is it possible to do that as part of this issue or file an issue so that we do not forget a known gap in testing ?

abhishekdas99 commented 6 years ago

We will file a different issue. We need to find out why UTs are not able to catch this.