linkedin / openhouse

Open Control Plane for Tables in Data Lakehouse
https://www.openhousedb.org/
BSD 2-Clause "Simplified" License
273 stars 43 forks source link

Exclude metrics-core lib pulled in by hadoop yarn node manager #126

Closed abhisheknath2011 closed 2 weeks ago

abhisheknath2011 commented 3 weeks ago

Summary

Hadoop 2.10.0 version (i.e. hadoop-yarn lib) pulls in very old version (3.0.1) of com.codahale.metrics:metrics-core lib. This lib is bundled in tables.jar and jobs.jar fat jar. Also some new methods such as gauge etc. are added in metrics-core lib version starting 3.2.0. So when tables.jar and jobs.jar coexists with higher version of metrics-core lib in the classpath and if new MetricRegistry APIs (such as gauge) are used by some codebase that results in method not found error. Hence, excluding metrics-core lib as this lib is not used in the OSS codebase and we can always pin higher version of this lib if needed.

|    |    |    +--- org.apache.hadoop:hadoop-yarn-server-nodemanager:2.10.0
|    |    |    |    +--- org.apache.hadoop:hadoop-yarn-common:2.10.0 (*)
|    |    |    |    +--- org.apache.hadoop:hadoop-yarn-api:2.10.0 (*)
|    |    |    |    +--- org.apache.hadoop:hadoop-yarn-registry:2.10.0 (*)
|    |    |    |    +--- javax.xml.bind:jaxb-api:2.2.2 (*)
|    |    |    |    +--- org.codehaus.jettison:jettison:1.1
|    |    |    |    +--- commons-lang:commons-lang:2.6
|    |    |    |    +--- javax.servlet:servlet-api:2.5
|    |    |    |    +--- commons-codec:commons-codec:1.4 -> 1.9
|    |    |    |    +--- com.sun.jersey:jersey-core:1.9
|    |    |    |    +--- com.sun.jersey:jersey-client:1.9 (*)
|    |    |    |    +--- org.mortbay.jetty:jetty-util:6.1.26
|    |    |    |    +--- com.google.guava:guava:11.0.2 -> 31.1-jre (*)
|    |    |    |    +--- commons-logging:commons-logging:1.1.3 -> 1.2
|    |    |    |    +--- org.slf4j:slf4j-api:1.7.25 -> 1.7.36
|    |    |    |    +--- com.google.protobuf:protobuf-java:2.5.0
|    |    |    |    +--- com.codahale.metrics:metrics-core:3.0.1

Method not found error:

com.codahale.metrics.MetricRegistry.gauge(Ljava/lang/String;Lcom/codahale/metrics/MetricRegistry$MetricSupplier;)Lcom/codahale/metrics/Gauge;

Changes

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

./gradlew clean build passed.

Tested using local docker

Create table:

anath1@anath1-mn1 oh-hadoop-spark % curl "${curlArgs[@]}" -XPOST http://localhost:8000/v1/databases/d3/tables/ \
--data-raw '{
  "tableId": "t1",
  "databaseId": "d3",
  "baseTableVersion": "INITIAL_VERSION",
  "clusterId": "LocalHadoopCluster",
  "schema": "{\"type\": \"struct\", \"fields\": [{\"id\": 1,\"required\": true,\"name\": \"id\",\"type\": \"string\"},{\"id\": 2,\"required\": true,\"name\": \"name\",\"type\": \"string\"},{\"id\": 3,\"required\": true,\"name\": \"ts\",\"type\": \"timestamp\"}]}",
  "timePartitioning": {
    "columnName": "ts",
    "granularity": "HOUR"
  },
  "clustering": [
    {
      "columnName": "name"
    }
  ],
  "tableProperties": {
    "key": "value"
  }
}'

{"tableId":"t1","databaseId":"d3","clusterId":"LocalHadoopCluster","tableUri":"LocalHadoopCluster.d3.t1","tableUUID":"12b090ff-0dce-487f-8e74-5d18c55c68da","tableLocation":"hdfs://namenode:9000/data/openhouse/d3/t1-12b090ff-0dce-487f-8e74-5d18c55c68da/00000-9a3a852b-26d7-43f6-8340-c0687466c3f5.metadata.json","tableVersion":"INITIAL_VERSION","tableCreator":"DUMMY_ANONYMOUS_USER","schema":"{\"type\":\"struct\",\"schema-id\":0,\"fields\":[{\"id\":1,\"name\":\"id\",\"required\":true,\"type\":\"string\"},{\"id\":2,\"name\":\"name\",\"required\":true,\"type\":\"string\"},{\"id\":3,\"name\":\"ts\",\"required\":true,\"type\":\"timestamp\"}]}","lastModifiedTime":1718344489040,"creationTime":1718344489040,"tableProperties":{"policies":"","write.metadata.delete-after-commit.enabled":"true","openhouse.tableId":"t1","openhouse.clusterId":"LocalHadoopCluster","openhouse.lastModifiedTime":"1718344489040","openhouse.tableVersion":"INITIAL_VERSION","openhouse.creationTime":"1718344489040","openhouse.tableUri":"LocalHadoopCluster.d3.t1","write.format.default":"orc","write.metadata.previous-versions-max":"28","openhouse.databaseId":"d3","openhouse.tableType":"PRIMARY_TABLE","openhouse.tableLocation":"/data/openhouse/d3/t1-12b090ff-0dce-487f-8e74-5d18c55c68da/00000-9a3a852b-26d7-43f6-8340-c0687466c3f5.metadata.json","openhouse.tableUUID":"12b090ff-0dce-487f-8e74-5d18c55c68da","key":"value","openhouse.tableCreator":"DUMMY_ANONYMOUS_USER"},"timePartitioning":{"columnName":"ts","granularity":"HOUR"},"clustering":[{"columnName":"name","transform":null}],"policies":null,"tableType":"PRIMARY_TABLE"}

List table:

anath1@anath1-mn1 oh-hadoop-spark % curl "${curlArgs[@]}" -XGET http://localhost:8000/v1/databases/d3/tables/
{"results":[{"tableId":"t1","databaseId":"d3","clusterId":"LocalHadoopCluster","tableUri":"LocalHadoopCluster.d3.t1","tableUUID":"12b090ff-0dce-487f-8e74-5d18c55c68da","tableLocation":"hdfs://namenode:9000/data/openhouse/d3/t1-12b090ff-0dce-487f-8e74-5d18c55c68da/00000-9a3a852b-26d7-43f6-8340-c0687466c3f5.metadata.json","tableVersion":"INITIAL_VERSION","tableCreator":"DUMMY_ANONYMOUS_USER","schema":"{\"type\":\"struct\",\"schema-id\":0,\"fields\":[{\"id\":1,\"name\":\"id\",\"required\":true,\"type\":\"string\"},{\"id\":2,\"name\":\"name\",\"required\":true,\"type\":\"string\"},{\"id\":3,\"name\":\"ts\",\"required\":true,\"type\":\"timestamp\"}]}","lastModifiedTime":1718344489040,"creationTime":1718344489040,"tableProperties":{"policies":"","write.metadata.delete-after-commit.enabled":"true","openhouse.tableId":"t1","openhouse.clusterId":"LocalHadoopCluster","openhouse.lastModifiedTime":"1718344489040","openhouse.tableVersion":"INITIAL_VERSION","openhouse.creationTime":"1718344489040","openhouse.tableUri":"LocalHadoopCluster.d3.t1","write.format.default":"orc","write.metadata.previous-versions-max":"28","openhouse.databaseId":"d3","openhouse.tableType":"PRIMARY_TABLE","openhouse.tableLocation":"/data/openhouse/d3/t1-12b090ff-0dce-487f-8e74-5d18c55c68da/00000-9a3a852b-26d7-43f6-8340-c0687466c3f5.metadata.json","openhouse.tableUUID":"12b090ff-0dce-487f-8e74-5d18c55c68da","key":"value","openhouse.tableCreator":"DUMMY_ANONYMOUS_USER"},"timePartitioning":{"columnName":"ts","granularity":"HOUR"},"clustering":[{"columnName":"name","transform":null}],"policies":null,"tableType":"PRIMARY_TABLE"}]}

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

For all the boxes checked, include additional details of the changes made in this pull request.