rework-space-com / ambari

Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
https://ambari.apache.org
Apache License 2.0
0 stars 0 forks source link

Fix Ambari Metrics installation #14

Closed Nazarii-Melnyk closed 2 months ago

Nazarii-Melnyk commented 4 months ago

After hadoop components installation restart task Metrics Monitor Stop returns next error:

resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf stop' returned 127. -bash: line 1: /usr/sbin/ambari-metrics-monitor: No such file or directory

Full error log:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/BGTP/1.0/services/AMBARI_METRICS/package/scripts/metrics_monitor.py", line 78, in <module>
    AmsMonitor().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/BGTP/1.0/services/AMBARI_METRICS/package/scripts/metrics_monitor.py", line 51, in stop
    action = 'stop'
  File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/stacks/BGTP/1.0/services/AMBARI_METRICS/package/scripts/ams_service.py", line 120, in ams_service
    user=params.ams_user
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
    returns=self.resource.returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf stop' returned 127. -bash: line 1: /usr/sbin/ambari-metrics-monitor: No such file or directory
Nazarii-Melnyk commented 4 months ago

This behavior can be caused by the absence of necessary deb packages in Ambari's apt repository. There are required to append CI/CD implementation and upload all deb packages on Nexus repo (not only ambari-server and ambari-agent packages).

Related issue - https://stackoverflow.com/questions/62265082/ambari-metrics-collector-service-not-starting

Nazarii-Melnyk commented 4 months ago

After uploading necessarry deb package on Nexus, for now there are required to install this package manually. Ambari don't install this package automatically and then returns errors. Command to install ambari-metrics package:

sudo apt update
sudo apt install ambari-metrics -y
Nazarii-Melnyk commented 4 months ago

Also there are one more problem while Metrics Collector service starts. Ambari returns next error:

resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh cp /usr/lib/ambari-server/htrace-core-3*.jar /usr/lib/ams-hbase/lib/client-facing-thirdparty/' returned 1. cp: cannot stat '/usr/lib/ambari-server/htrace-core-3*.jar': No such file or directory

Full error log:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/BGTP/1.0/services/AMBARI_METRICS/package/scripts/metrics_collector.py", line 90, in <module>
    AmsCollector().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/BGTP/1.0/services/AMBARI_METRICS/package/scripts/metrics_collector.py", line 48, in start
    self.configure(env, action = 'start') # for security
  File "/var/lib/ambari-agent/cache/stacks/BGTP/1.0/services/AMBARI_METRICS/package/scripts/metrics_collector.py", line 45, in configure
    ams(name='collector')
  File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/stacks/BGTP/1.0/services/AMBARI_METRICS/package/scripts/ams.py", line 349, in ams
    hbase_service('depnd_cyclic', action = 'metricsFIX')
  File "/var/lib/ambari-agent/cache/stacks/BGTP/1.0/services/AMBARI_METRICS/package/scripts/hbase_service.py", line 57, in hbase_service
    Execute(format("{sudo} cp /usr/lib/ambari-server/htrace-core-3*.jar /usr/lib/ams-hbase/lib/client-facing-thirdparty/")
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
    returns=self.resource.returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh cp /usr/lib/ambari-server/htrace-core-3*.jar /usr/lib/ams-hbase/lib/client-facing-thirdparty/' returned 1. cp: cannot stat '/usr/lib/ambari-server/htrace-core-3*.jar': No such file or directory
Nazarii-Melnyk commented 4 months ago

The command, that excepted an error, trying to copy some files from ambari-server, so to complite installation successfully there are required to install ambari metrics collector and grafana on edge node

Nazarii-Melnyk commented 4 months ago

While ambari collector installing, the next error is occuring:

resource_management.core.exceptions.Fail: Pid file /var/run/ambari-metrics-collector/hbase-ams-master.pid doesn't exist after starting of the component.
Nazarii-Melnyk commented 4 months ago

Logs, that Ambari Metrics Collector provides:

Nazarii-Melnyk commented 3 months ago

According to the logs, ams-hbase is trying to get the required class by path org/apache/hadoop/thirdparty/com/google/common/base/Preconditions, but after running command jar tf /usr/lib/ams-hbase/lib/guava-27.0-jre.jar | grep Preconditions output returned next:

com/google/common/base/Preconditions.class
com/google/common/collect/CollectPreconditions.class
com/google/common/math/MathPreconditions.class

There are required to understand:

Nazarii-Melnyk commented 3 months ago

Ams-hbase trying to get class Preconditions from dependency hadoop-shaded-guava:

Nazarii-Melnyk commented 3 months ago

The problem with dependencies can be fixed by adding hadoop-shaded-*.jar dependencies, but ams-hbase still can't start.

Log file ambari-metrics-collector.out:

Jun 27, 2024 10:12:14 AM java.util.logging.LogManager$RootLogger log
SEVERE: Failed to resolve default logging config file: config/java.util.logging.properties

Log file ambari-metrics-collector.log:

2024-06-27 10:18:07,094 WARN org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient: 0x55562aa9 to isv-s-hdp-mng-01.example.com:61181 failed for get of /ams-hbase-unsecure/hbaseid, code = CONNECTIONLOSS, retries = 30
2024-06-27 10:18:08,094 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server isv-s-hdp-mng-01.example.com/<IP>:61181. Will not attempt to authenticate using SASL (unknown error)
2024-06-27 10:18:08,095 INFO org.apache.zookeeper.ClientCnxn: Socket error occurred: isv-s-hdp-mng-01.example.com/<IP>:61181: Connection refused
2024-06-27 10:18:08,196 WARN org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient: 0x55562aa9 to isv-s-hdp-mng-01.example.com:61181 failed for get of /ams-hbase-unsecure/hbaseid, code = CONNECTIONLOSS, retries = 30, give up
2024-06-27 10:18:08,212 WARN org.apache.hadoop.hbase.client.ConnectionImplementation: Retrieve cluster id failed
java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/hbaseid
    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
    at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:666)
    at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:325)
    at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:272)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hbase.client.ConnectionFactory.lambda$null$0(ConnectionFactory.java:233)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
    at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:320)
    at org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$1(ConnectionFactory.java:232)
    at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:131)
    at org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createConnection(HConnectionFactory.java:47)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:480)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.access$400(ConnectionQueryServicesImpl.java:312)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:3254)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:3230)
    at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:3230)
    at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:255)
    at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:144)
    at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:270)
    at org.apache.ambari.metrics.core.timeline.query.DefaultPhoenixDataSource.getConnection(DefaultPhoenixDataSource.java:84)
    at org.apache.ambari.metrics.core.timeline.PhoenixHBaseAccessor.getConnection(PhoenixHBaseAccessor.java:502)
    at org.apache.ambari.metrics.core.timeline.PhoenixHBaseAccessor.getConnectionRetryingOnException(PhoenixHBaseAccessor.java:480)
    at org.apache.ambari.metrics.core.timeline.discovery.TimelineMetricMetadataManager.initializeMetadata(TimelineMetricMetadataManager.java:150)
    at org.apache.ambari.metrics.core.timeline.discovery.TimelineMetricMetadataManager.initializeMetadata(TimelineMetricMetadataManager.java:133)
    at org.apache.ambari.metrics.core.timeline.HBaseTimelineMetricsService.initializeSubsystem(HBaseTimelineMetricsService.java:121)
    at org.apache.ambari.metrics.core.timeline.HBaseTimelineMetricsService.serviceInit(HBaseTimelineMetricsService.java:102)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
    at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
    at org.apache.ambari.metrics.AMSApplicationServer.serviceInit(AMSApplicationServer.java:65)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
    at org.apache.ambari.metrics.AMSApplicationServer.launchAMSApplicationServer(AMSApplicationServer.java:97)
    at org.apache.ambari.metrics.AMSApplicationServer.main(AMSApplicationServer.java:107)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/hbaseid
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
    at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:197)
    at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:344)
    at java.lang.Thread.run(Thread.java:750)
2024-06-27 10:18:08,377 INFO org.apache.phoenix.query.ConnectionQueryServicesImpl: HConnection established. Stacktrace for informational purposes: hconnection-0x5860f3d7 java.lang.Thread.getStackTrace(Thread.java:1564)
org.apache.phoenix.util.LogUtil.getCallerStackTrace(LogUtil.java:55)
org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:483)
org.apache.phoenix.query.ConnectionQueryServicesImpl.access$400(ConnectionQueryServicesImpl.java:312)
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:3254)
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:3230)
org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)
org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:3230)
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:255)
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:144)
org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
java.sql.DriverManager.getConnection(DriverManager.java:664)
java.sql.DriverManager.getConnection(DriverManager.java:270)
org.apache.ambari.metrics.core.timeline.query.DefaultPhoenixDataSource.getConnection(DefaultPhoenixDataSource.java:84)
org.apache.ambari.metrics.core.timeline.PhoenixHBaseAccessor.getConnection(PhoenixHBaseAccessor.java:502)
org.apache.ambari.metrics.core.timeline.PhoenixHBaseAccessor.getConnectionRetryingOnException(PhoenixHBaseAccessor.java:480)
org.apache.ambari.metrics.core.timeline.discovery.TimelineMetricMetadataManager.initializeMetadata(TimelineMetricMetadataManager.java:150)
org.apache.ambari.metrics.core.timeline.discovery.TimelineMetricMetadataManager.initializeMetadata(TimelineMetricMetadataManager.java:133)
org.apache.ambari.metrics.core.timeline.HBaseTimelineMetricsService.initializeSubsystem(HBaseTimelineMetricsService.java:121)
org.apache.ambari.metrics.core.timeline.HBaseTimelineMetricsService.serviceInit(HBaseTimelineMetricsService.java:102)
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
org.apache.ambari.metrics.AMSApplicationServer.serviceInit(AMSApplicationServer.java:65)
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
org.apache.ambari.metrics.AMSApplicationServer.launchAMSApplicationServer(AMSApplicationServer.java:97)
org.apache.ambari.metrics.AMSApplicationServer.main(AMSApplicationServer.java:107)

Log file hbase-ams-master-isv-s-hdp-mng-01.example.com.log

2024-06-27T10:23:09,311 INFO  [main-SendThread(isv-s-hdp-mng-01.example.com:61181)] zookeeper.ClientCnxn: Socket error occurred: isv-s-hdp-mng-01.example.com/<IP>:61181: Connection refused
2024-06-27T10:23:09,412 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper create failed after 4 attempts
2024-06-27T10:23:10,412 INFO  [main-SendThread(isv-s-hdp-mng-01.example.com:61181)] zookeeper.ClientCnxn: Opening socket connection to server isv-s-hdp-mng-01.example.com/<IP>:61181. Will not attempt to authenticate using SASL (unknown error)
2024-06-27T10:23:10,523 INFO  [main] zookeeper.ZooKeeper: Session: 0x0 closed
2024-06-27T10:23:10,528 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x0
2024-06-27T10:23:10,524 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
org.apache.hadoop.hbase.ZooKeeperConnectionException: master:613000x0, quorum=isv-s-hdp-mng-01.example.com:61181, baseZNode=/ams-hbase-unsecure Unexpected KeeperException creating base node
    at org.apache.hadoop.hbase.zookeeper.ZKWatcher.createBaseZNodes(ZKWatcher.java:260) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKWatcher.<init>(ZKWatcher.java:184) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKWatcher.<init>(ZKWatcher.java:136) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:682) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:460) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.<init>(HMasterCommandLine.java:322) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_412]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_412]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_412]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_412]
    at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:124) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:222) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:169) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:112) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:241) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:147) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81) ~[hadoop-common-3.3.4.1.2.1.0-134.jar:?]
    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:140) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3291) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[zookeeper-3.5.6.1.2.1.0-134.jar:3.5.6-134-f5aa3fd3e117b47079c9b4eee1f7177c2142211c]
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[zookeeper-3.5.6.1.2.1.0-134.jar:3.5.6-134-f5aa3fd3e117b47079c9b4eee1f7177c2142211c]
    at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1538) ~[zookeeper-3.5.6.1.2.1.0-134.jar:3.5.6-134-f5aa3fd3e117b47079c9b4eee1f7177c2142211c]
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:649) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:623) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:823) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:805) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKWatcher.createBaseZNodes(ZKWatcher.java:251) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    ... 18 more
2024-06-27T10:23:10,539 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMasterKeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure
    at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:128) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:222) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:169) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:112) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:241) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:147) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81) ~[hadoop-common-3.3.4.1.2.1.0-134.jar:?]
    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:140) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3291) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: master:613000x0, quorum=isv-s-hdp-mng-01.example.com:61181, baseZNode=/ams-hbase-unsecure Unexpected KeeperException creating base node
    at org.apache.hadoop.hbase.zookeeper.ZKWatcher.createBaseZNodes(ZKWatcher.java:260) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKWatcher.<init>(ZKWatcher.java:184) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKWatcher.<init>(ZKWatcher.java:136) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:682) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:460) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.<init>(HMasterCommandLine.java:322) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_412]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_412]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_412]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_412]
    at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:124) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    ... 8 more
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[zookeeper-3.5.6.1.2.1.0-134.jar:3.5.6-134-f5aa3fd3e117b47079c9b4eee1f7177c2142211c]
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[zookeeper-3.5.6.1.2.1.0-134.jar:3.5.6-134-f5aa3fd3e117b47079c9b4eee1f7177c2142211c]
    at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1538) ~[zookeeper-3.5.6.1.2.1.0-134.jar:3.5.6-134-f5aa3fd3e117b47079c9b4eee1f7177c2142211c]
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:649) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:623) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:823) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:805) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKWatcher.createBaseZNodes(ZKWatcher.java:251) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKWatcher.<init>(ZKWatcher.java:184) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.zookeeper.ZKWatcher.<init>(ZKWatcher.java:136) ~[hbase-zookeeper-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:682) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:460) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.<init>(HMasterCommandLine.java:322) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_412]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_412]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_412]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_412]
    at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:124) ~[hbase-server-2.5.3.1.2.1.0-134.jar:2.5.3.1.2.1.0-134]
    ... 8 more

Log file hbase-ams-master-isv-s-hdp-mng-01.example.com.out:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/ams-hbase/lib/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/ams-hbase/lib/client-facing-thirdparty/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Nazarii-Melnyk commented 2 months ago

Ambari Metrics was successfully started using ODP stack:

image

image

Ambari logs:

2024-07-18 19:06:23,830 - Stack Feature Version Info: Cluster Stack=1.2, Command Stack=None, Command Version=1.2.2.0-138 -> 1.2.2.0-138
2024-07-18 19:06:23,839 - Using hadoop conf dir: /usr/odp/1.2.2.0-138/hadoop/conf
2024-07-18 19:06:23,934 - Stack Feature Version Info: Cluster Stack=1.2, Command Stack=None, Command Version=1.2.2.0-138 -> 1.2.2.0-138
2024-07-18 19:06:23,936 - Using hadoop conf dir: /usr/odp/1.2.2.0-138/hadoop/conf
2024-07-18 19:06:23,937 - Group['hdfs'] {}
2024-07-18 19:06:23,937 - Group['hadoop'] {}
2024-07-18 19:06:23,938 - Group['users'] {}
2024-07-18 19:06:23,938 - User['yarn-ats'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2024-07-18 19:06:23,939 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2024-07-18 19:06:23,940 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2024-07-18 19:06:23,941 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'users'], 'uid': None}
2024-07-18 19:06:23,942 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hdfs', 'hadoop'], 'uid': None}
2024-07-18 19:06:23,943 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2024-07-18 19:06:23,943 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2024-07-18 19:06:23,944 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2024-07-18 19:06:23,945 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2024-07-18 19:06:23,955 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] due to not_if
2024-07-18 19:06:23,955 - Group['hdfs'] {}
2024-07-18 19:06:23,955 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hdfs', 'hadoop', u'hdfs']}
2024-07-18 19:06:23,956 - FS Type: HDFS
2024-07-18 19:06:23,956 - Directory['/etc/hadoop'] {'mode': 0755}
2024-07-18 19:06:23,966 - File['/usr/odp/1.2.2.0-138/hadoop/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2024-07-18 19:06:23,967 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777}
2024-07-18 19:06:23,981 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2024-07-18 19:06:23,991 - Skipping Execute[('setenforce', '0')] due to not_if
2024-07-18 19:06:23,992 - Directory['/var/log/hadoop'] {'owner': 'root', 'create_parents': True, 'group': 'hadoop', 'mode': 0775, 'cd_access': 'a'}
2024-07-18 19:06:23,995 - Directory['/var/run/hadoop'] {'owner': 'root', 'create_parents': True, 'group': 'root', 'cd_access': 'a'}
2024-07-18 19:06:23,995 - Directory['/var/run/hadoop/hdfs'] {'owner': 'hdfs', 'cd_access': 'a'}
2024-07-18 19:06:23,996 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents': True, 'cd_access': 'a'}
2024-07-18 19:06:24,000 - File['/usr/odp/1.2.2.0-138/hadoop/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'}
2024-07-18 19:06:24,002 - File['/usr/odp/1.2.2.0-138/hadoop/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'hdfs'}
2024-07-18 19:06:24,008 - File['/usr/odp/1.2.2.0-138/hadoop/conf/log4j.properties'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2024-07-18 19:06:24,014 - File['/usr/odp/1.2.2.0-138/hadoop/conf/hadoop-metrics2.properties'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2024-07-18 19:06:24,015 - File['/usr/odp/1.2.2.0-138/hadoop/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}
2024-07-18 19:06:24,017 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop', 'mode': 0644}
2024-07-18 19:06:24,023 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2024-07-18 19:06:24,028 - Skipping unlimited key JCE policy check and setup since the Java VM is not managed by Ambari
2024-07-18 19:06:24,032 - Skipping stack-select on AMBARI_METRICS because it does not exist in the stack-select package structure.
2024-07-18 19:06:24,156 - Using hadoop conf dir: /usr/odp/1.2.2.0-138/hadoop/conf
2024-07-18 19:06:24,157 - checked_call['hostid'] {}
2024-07-18 19:06:24,164 - checked_call returned (0, 'd90a160e')
2024-07-18 19:06:24,165 - Directory['/etc/ams-hbase/conf'] {'owner': 'ams', 'group': 'hadoop', 'create_parents': True, 'recursive_ownership': True}
2024-07-18 19:06:24,166 - Directory['/var/lib/ambari-metrics-collector/hbase-tmp'] {'owner': 'ams', 'create_parents': True, 'recursive_ownership': True, 'cd_access': 'a'}
2024-07-18 19:06:24,167 - Creating directory Directory['/var/lib/ambari-metrics-collector/hbase-tmp'] since it doesn't exist.
2024-07-18 19:06:24,167 - Changing owner for /var/lib/ambari-metrics-collector/hbase-tmp from 0 to ams
2024-07-18 19:06:24,167 - Directory['/var/lib/ambari-metrics-collector/hbase-tmp/local/jars'] {'owner': 'ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0775, 'cd_access': 'a'}
2024-07-18 19:06:24,167 - Creating directory Directory['/var/lib/ambari-metrics-collector/hbase-tmp/local/jars'] since it doesn't exist.
2024-07-18 19:06:24,167 - Changing owner for /var/lib/ambari-metrics-collector/hbase-tmp/local/jars from 0 to ams
2024-07-18 19:06:24,167 - Changing group for /var/lib/ambari-metrics-collector/hbase-tmp/local/jars from 0 to hadoop
2024-07-18 19:06:24,168 - Changing permission for /var/lib/ambari-metrics-collector/hbase-tmp/local/jars from 755 to 775
2024-07-18 19:06:24,168 - File['/etc/ams-hbase/conf/core-site.xml'] {'owner': 'ams', 'action': ['delete']}
2024-07-18 19:06:24,168 - File['/etc/ams-hbase/conf/hdfs-site.xml'] {'owner': 'ams', 'action': ['delete']}
2024-07-18 19:06:24,168 - XmlConfig['hbase-site.xml'] {'owner': 'ams', 'group': 'hadoop', 'conf_dir': '/etc/ams-hbase/conf', 'configuration_attributes': {u'final': {u'hbase.zookeeper.quorum': u'true'}}, 'configurations': ...}
2024-07-18 19:06:24,174 - Generating config: /etc/ams-hbase/conf/hbase-site.xml
2024-07-18 19:06:24,174 - File['/etc/ams-hbase/conf/hbase-site.xml'] {'owner': 'ams', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2024-07-18 19:06:24,195 - Writing File['/etc/ams-hbase/conf/hbase-site.xml'] because contents don't match
2024-07-18 19:06:24,195 - Directory['/var/lib/ambari-metrics-collector/hbase-tmp/phoenix-spool'] {'owner': 'ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2024-07-18 19:06:24,196 - Creating directory Directory['/var/lib/ambari-metrics-collector/hbase-tmp/phoenix-spool'] since it doesn't exist.
2024-07-18 19:06:24,196 - Changing owner for /var/lib/ambari-metrics-collector/hbase-tmp/phoenix-spool from 0 to ams
2024-07-18 19:06:24,196 - Changing group for /var/lib/ambari-metrics-collector/hbase-tmp/phoenix-spool from 0 to hadoop
2024-07-18 19:06:24,196 - XmlConfig['hbase-policy.xml'] {'owner': 'ams', 'group': 'hadoop', 'conf_dir': '/etc/ams-hbase/conf', 'configuration_attributes': {}, 'configurations': {u'security.admin.protocol.acl': u'*', u'security.masterregion.protocol.acl': u'*', u'security.client.protocol.acl': u'*'}}
2024-07-18 19:06:24,201 - Generating config: /etc/ams-hbase/conf/hbase-policy.xml
2024-07-18 19:06:24,201 - File['/etc/ams-hbase/conf/hbase-policy.xml'] {'owner': 'ams', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2024-07-18 19:06:24,202 - Writing File['/etc/ams-hbase/conf/hbase-policy.xml'] because contents don't match
2024-07-18 19:06:24,207 - File['/etc/ams-hbase/conf/hbase-env.sh'] {'content': InlineTemplate(...), 'owner': 'ams'}
2024-07-18 19:06:24,207 - Writing File['/etc/ams-hbase/conf/hbase-env.sh'] because contents don't match
2024-07-18 19:06:24,210 - File['/etc/ams-hbase/conf/hadoop-metrics2-hbase.properties'] {'content': Template('hadoop-metrics2-hbase.properties.j2'), 'owner': 'ams', 'group': 'hadoop'}
2024-07-18 19:06:24,211 - Writing File['/etc/ams-hbase/conf/hadoop-metrics2-hbase.properties'] because contents don't match
2024-07-18 19:06:24,211 - TemplateConfig['/etc/ams-hbase/conf/regionservers'] {'owner': 'ams', 'template_tag': None}
2024-07-18 19:06:24,212 - File['/etc/ams-hbase/conf/regionservers'] {'content': Template('regionservers.j2'), 'owner': 'ams', 'group': None, 'mode': None}
2024-07-18 19:06:24,212 - Writing File['/etc/ams-hbase/conf/regionservers'] because contents don't match
2024-07-18 19:06:24,212 - Directory['/var/run/ambari-metrics-collector/'] {'owner': 'ams', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2024-07-18 19:06:24,212 - Creating directory Directory['/var/run/ambari-metrics-collector/'] since it doesn't exist.
2024-07-18 19:06:24,213 - Changing owner for /var/run/ambari-metrics-collector/ from 0 to ams
2024-07-18 19:06:24,213 - Directory['/var/log/ambari-metrics-collector'] {'owner': 'ams', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2024-07-18 19:06:24,213 - Creating directory Directory['/var/log/ambari-metrics-collector'] since it doesn't exist.
2024-07-18 19:06:24,213 - Changing owner for /var/log/ambari-metrics-collector from 0 to ams
2024-07-18 19:06:24,213 - Directory['/var/lib/ambari-metrics-collector/hbase'] {'owner': 'ams', 'create_parents': True, 'recursive_ownership': True, 'cd_access': 'a'}
2024-07-18 19:06:24,213 - Creating directory Directory['/var/lib/ambari-metrics-collector/hbase'] since it doesn't exist.
2024-07-18 19:06:24,213 - Changing owner for /var/lib/ambari-metrics-collector/hbase from 0 to ams
2024-07-18 19:06:24,214 - File['/var/run/ambari-metrics-collector//distributed_mode'] {'owner': 'ams', 'action': ['delete']}
2024-07-18 19:06:24,215 - File['/etc/ams-hbase/conf/log4j.properties'] {'content': InlineTemplate(...), 'owner': 'ams', 'group': 'hadoop', 'mode': 0644}
2024-07-18 19:06:24,219 - Directory['/usr/lib/ambari-logsearch-logfeeder/conf'] {'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2024-07-18 19:06:24,219 - Generate Log Feeder config file: /usr/lib/ambari-logsearch-logfeeder/conf/input.config-ambari-metrics.json
2024-07-18 19:06:24,219 - File['/usr/lib/ambari-logsearch-logfeeder/conf/input.config-ambari-metrics.json'] {'content': Template('input.config-ambari-metrics.json.j2'), 'mode': 0644}
2024-07-18 19:06:24,220 - Directory['/etc/ams-hbase/conf'] {'owner': 'ams', 'group': 'hadoop', 'create_parents': True, 'recursive_ownership': True}
2024-07-18 19:06:24,220 - Directory['/var/lib/ambari-metrics-collector/hbase-tmp'] {'owner': 'ams', 'create_parents': True, 'recursive_ownership': True, 'cd_access': 'a'}
2024-07-18 19:06:24,220 - Directory['/var/lib/ambari-metrics-collector/hbase-tmp/local/jars'] {'owner': 'ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0775, 'cd_access': 'a'}
2024-07-18 19:06:24,221 - File['/etc/ams-hbase/conf/core-site.xml'] {'owner': 'ams', 'action': ['delete']}
2024-07-18 19:06:24,221 - File['/etc/ams-hbase/conf/hdfs-site.xml'] {'owner': 'ams', 'action': ['delete']}
2024-07-18 19:06:24,221 - XmlConfig['hbase-site.xml'] {'owner': 'ams', 'group': 'hadoop', 'conf_dir': '/etc/ams-hbase/conf', 'configuration_attributes': {u'final': {u'hbase.zookeeper.quorum': u'true'}}, 'configurations': ...}
2024-07-18 19:06:24,225 - Generating config: /etc/ams-hbase/conf/hbase-site.xml
2024-07-18 19:06:24,225 - File['/etc/ams-hbase/conf/hbase-site.xml'] {'owner': 'ams', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2024-07-18 19:06:24,246 - XmlConfig['hbase-policy.xml'] {'owner': 'ams', 'group': 'hadoop', 'conf_dir': '/etc/ams-hbase/conf', 'configuration_attributes': {}, 'configurations': {u'security.admin.protocol.acl': u'*', u'security.masterregion.protocol.acl': u'*', u'security.client.protocol.acl': u'*'}}
2024-07-18 19:06:24,250 - Generating config: /etc/ams-hbase/conf/hbase-policy.xml
2024-07-18 19:06:24,250 - File['/etc/ams-hbase/conf/hbase-policy.xml'] {'owner': 'ams', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2024-07-18 19:06:24,256 - File['/etc/ams-hbase/conf/hbase-env.sh'] {'content': InlineTemplate(...), 'owner': 'ams'}
2024-07-18 19:06:24,258 - File['/etc/ams-hbase/conf/hadoop-metrics2-hbase.properties'] {'content': Template('hadoop-metrics2-hbase.properties.j2'), 'owner': 'ams', 'group': 'hadoop'}
2024-07-18 19:06:24,259 - TemplateConfig['/etc/ams-hbase/conf/regionservers'] {'owner': 'ams', 'template_tag': None}
2024-07-18 19:06:24,260 - File['/etc/ams-hbase/conf/regionservers'] {'content': Template('regionservers.j2'), 'owner': 'ams', 'group': None, 'mode': None}
2024-07-18 19:06:24,260 - Directory['/var/run/ambari-metrics-collector/'] {'owner': 'ams', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2024-07-18 19:06:24,260 - Directory['/var/log/ambari-metrics-collector'] {'owner': 'ams', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2024-07-18 19:06:24,262 - File['/etc/ams-hbase/conf/log4j.properties'] {'content': InlineTemplate(...), 'owner': 'ams', 'group': 'hadoop', 'mode': 0644}
2024-07-18 19:06:24,265 - Directory['/usr/lib/ambari-logsearch-logfeeder/conf'] {'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2024-07-18 19:06:24,265 - Generate Log Feeder config file: /usr/lib/ambari-logsearch-logfeeder/conf/input.config-ambari-metrics.json
2024-07-18 19:06:24,265 - File['/usr/lib/ambari-logsearch-logfeeder/conf/input.config-ambari-metrics.json'] {'content': Template('input.config-ambari-metrics.json.j2'), 'mode': 0644}
2024-07-18 19:06:24,266 - Directory['/etc/ambari-metrics-collector/conf'] {'owner': 'ams', 'group': 'hadoop', 'create_parents': True, 'recursive_ownership': True}
2024-07-18 19:06:24,266 - Changing owner for /etc/ambari-metrics-collector/conf from 0 to ams
2024-07-18 19:06:24,266 - Changing group for /etc/ambari-metrics-collector/conf from 0 to hadoop
2024-07-18 19:06:24,266 - Directory['/var/lib/ambari-metrics-collector/checkpoint'] {'owner': 'ams', 'group': 'hadoop', 'create_parents': True, 'recursive_ownership': True, 'cd_access': 'a'}
2024-07-18 19:06:24,266 - Creating directory Directory['/var/lib/ambari-metrics-collector/checkpoint'] since it doesn't exist.
2024-07-18 19:06:24,266 - Changing owner for /var/lib/ambari-metrics-collector/checkpoint from 0 to ams
2024-07-18 19:06:24,266 - Changing group for /var/lib/ambari-metrics-collector/checkpoint from 0 to hadoop
2024-07-18 19:06:24,267 - XmlConfig['ams-site.xml'] {'owner': 'ams', 'group': 'hadoop', 'conf_dir': '/etc/ambari-metrics-collector/conf', 'configuration_attributes': {}, 'configurations': ...}
2024-07-18 19:06:24,271 - Generating config: /etc/ambari-metrics-collector/conf/ams-site.xml
2024-07-18 19:06:24,271 - File['/etc/ambari-metrics-collector/conf/ams-site.xml'] {'owner': 'ams', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2024-07-18 19:06:24,299 - Writing File['/etc/ambari-metrics-collector/conf/ams-site.xml'] because contents don't match
2024-07-18 19:06:24,299 - XmlConfig['ssl-server.xml'] {'owner': 'ams', 'group': 'hadoop', 'conf_dir': '/etc/ambari-metrics-collector/conf', 'configuration_attributes': {}, 'configurations': ...}
2024-07-18 19:06:24,304 - Generating config: /etc/ambari-metrics-collector/conf/ssl-server.xml
2024-07-18 19:06:24,304 - File['/etc/ambari-metrics-collector/conf/ssl-server.xml'] {'owner': 'ams', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2024-07-18 19:06:24,307 - Writing File['/etc/ambari-metrics-collector/conf/ssl-server.xml'] because it doesn't exist
2024-07-18 19:06:24,307 - Changing owner for /etc/ambari-metrics-collector/conf/ssl-server.xml from 0 to ams
2024-07-18 19:06:24,307 - Changing group for /etc/ambari-metrics-collector/conf/ssl-server.xml from 0 to hadoop
2024-07-18 19:06:24,308 - XmlConfig['hbase-site.xml'] {'owner': 'ams', 'group': 'hadoop', 'conf_dir': '/etc/ambari-metrics-collector/conf', 'configuration_attributes': {u'final': {u'hbase.zookeeper.quorum': u'true'}}, 'configurations': ...}
2024-07-18 19:06:24,312 - Generating config: /etc/ambari-metrics-collector/conf/hbase-site.xml
2024-07-18 19:06:24,312 - File['/etc/ambari-metrics-collector/conf/hbase-site.xml'] {'owner': 'ams', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2024-07-18 19:06:24,333 - Writing File['/etc/ambari-metrics-collector/conf/hbase-site.xml'] because contents don't match
2024-07-18 19:06:24,334 - File['/etc/ambari-metrics-collector/conf/log4j.properties'] {'content': InlineTemplate(...), 'owner': 'ams', 'group': 'hadoop', 'mode': 0644}
2024-07-18 19:06:24,334 - Writing File['/etc/ambari-metrics-collector/conf/log4j.properties'] because contents don't match
2024-07-18 19:06:24,337 - File['/etc/ambari-metrics-collector/conf/ams-env.sh'] {'content': InlineTemplate(...), 'owner': 'ams'}
2024-07-18 19:06:24,337 - Writing File['/etc/ambari-metrics-collector/conf/ams-env.sh'] because contents don't match
2024-07-18 19:06:24,337 - Directory['/var/log/ambari-metrics-collector'] {'owner': 'ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2024-07-18 19:06:24,337 - Changing group for /var/log/ambari-metrics-collector from 0 to hadoop
2024-07-18 19:06:24,337 - Directory['/var/run/ambari-metrics-collector'] {'owner': 'ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2024-07-18 19:06:24,338 - Changing group for /var/run/ambari-metrics-collector from 0 to hadoop
2024-07-18 19:06:24,338 - File['/usr/lib/ams-hbase/bin/hadoop'] {'owner': 'ams', 'mode': 0755}
2024-07-18 19:06:24,338 - Writing File['/usr/lib/ams-hbase/bin/hadoop'] because it doesn't exist
2024-07-18 19:06:24,338 - Changing owner for /usr/lib/ams-hbase/bin/hadoop from 0 to ams
2024-07-18 19:06:24,338 - Changing permission for /usr/lib/ams-hbase/bin/hadoop from 644 to 755
2024-07-18 19:06:24,338 - Directory['/etc/security/limits.d'] {'owner': 'root', 'create_parents': True, 'group': 'root'}
2024-07-18 19:06:24,339 - File['/etc/security/limits.d/ams.conf'] {'content': Template('ams.conf.j2'), 'owner': 'root', 'group': 'root', 'mode': 0644}
2024-07-18 19:06:24,342 - Directory['/usr/lib/ambari-logsearch-logfeeder/conf'] {'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
2024-07-18 19:06:24,342 - Generate Log Feeder config file: /usr/lib/ambari-logsearch-logfeeder/conf/input.config-ambari-metrics.json
2024-07-18 19:06:24,342 - File['/usr/lib/ambari-logsearch-logfeeder/conf/input.config-ambari-metrics.json'] {'content': Template('input.config-ambari-metrics.json.j2'), 'mode': 0644}
2024-07-18 19:06:24,343 - Execute['/usr/lib/ams-hbase/bin/hbase-daemon.sh --config /etc/ams-hbase/conf stop regionserver'] {'on_timeout': 'ls /var/run/ambari-metrics-collector//hbase-ams-regionserver.pid >/dev/null 2>&1 && ps `cat /var/run/ambari-metrics-collector//hbase-ams-regionserver.pid` >/dev/null 2>&1 && ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh cat /var/run/ambari-metrics-collector//hbase-ams-regionserver.pid`', 'timeout': 30, 'user': 'ams'}
2024-07-18 19:06:24,426 - File['/var/run/ambari-metrics-collector//hbase-ams-regionserver.pid'] {'action': ['delete']}
2024-07-18 19:06:24,428 - Execute['/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf stop'] {'user': 'ams'}
2024-07-18 19:06:24,539 - Execute['ambari-sudo.sh rm -rf /var/lib/ambari-metrics-collector/hbase-tmp/*.tmp'] {}
2024-07-18 19:06:24,553 - File['/etc/ambari-metrics-collector/conf/core-site.xml'] {'owner': 'ams', 'action': ['delete']}
2024-07-18 19:06:24,554 - File['/etc/ambari-metrics-collector/conf/hdfs-site.xml'] {'owner': 'ams', 'action': ['delete']}
2024-07-18 19:06:24,555 - Execute['/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf start'] {'user': 'ams'}
2024-07-18 19:07:09,733 - Component has started with pid(s): 7874, 7801
2024-07-18 19:07:09,746 - Skipping stack-select on AMBARI_METRICS because it does not exist in the stack-select package structure.

Command completed successfully!
Nazarii-Melnyk commented 2 months ago

Differences between Ambari Metrics configurations of BGTP-1.0 MPack and ODP-1.2 MPack:

BGTP

General:
Metrics Service operation mode - distrubuted

Advanced ams-hbase-env:
hbase_classpath_additional - /usr/lib/ams-hbase/lib/*
HBase Master Maximum Memory - 640
HBase RegionServer Maximum Memory - 768

Advanced ams-hbase-site:
dfs.client.read.shortcircuit - false
hbase.zookeeper.leaderport - 3888
hbase.zookeeper.peerport - 2888

Advanced ams-site:
timeline.metrics.cache.commit.interval - 3
timeline.metrics.cache.size - 150
timeline.metrics.downsampler.event.metric.patterns - NONE
timeline.metrics.downsampler.topn.metric.patterns - NONE
timeline.metrics.service.webapp.address - isv-s-hdp-edge-01.dmp-insight.com:6188
timeline.metrics.transient.metric.patterns - topology\.%,dfs.NNTopUserOpCounts.windowMs=60000.op=__%.user=%,dfs.NNTopUserOpCounts.windowMs=300000.op=__%.user=%,dfs.NNTopUserOpCounts.windowMs=1500000.op=__%.user=%

Custom ams-site:
timeline.metrics.cluster.aggregate.splitpoints - 
kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile
timeline.metrics.host.aggregate.splitpoints - kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile

ODP

General:
Metrics Service operation mode - embedded

Advanced ams-hbase-env:
hbase_classpath_additional - NONE
HBase Master Maximum Memory - 768
HBase RegionServer Maximum Memory - 512

Advanced ams-hbase-site:
dfs.client.read.shortcircuit - true
hbase.zookeeper.leaderport - 61388
hbase.zookeeper.peerport - 61288

Advanced ams-site:
timeline.metrics.cache.commit.interval - 10
timeline.metrics.cache.size - 200
timeline.metrics.downsampler.event.metric.patterns - topology\.%
timeline.metrics.downsampler.topn.metric.patterns - dfs.NNTopUserOpCounts.windowMs=60000.op=__%.user=%,dfs.NNTopUserOpCounts.windowMs=300000.op=__%.user=%,dfs.NNTopUserOpCounts.windowMs=1500000.op=__%.user=%
timeline.metrics.service.webapp.address - 0.0.0.0:6188
timeline.metrics.transient.metric.patterns - topology\.%

Custom ams-site:
timeline.metrics.cluster.aggregate.splitpoints - NO RECORD
timeline.metrics.host.aggregate.splitpoints - NO RECORD
Nazarii-Melnyk commented 2 months ago

AMS Collector startup was fixed by updating BGTP Ambari MPack to support ODP Ambari Metrics artifact.

Linked issue - https://github.com/rework-space-com/bigtop/issues/26 Linked PR - https://github.com/rework-space-com/bigtop/pull/27