hortonworks / streamline

StreamLine - Streaming Analytics
Apache License 2.0
164 stars 96 forks source link

Shows issue while refreshing service pool cluster #1126

Closed shahsank3t closed 6 years ago

shahsank3t commented 6 years ago

refreshing cluster fails

HeartSaVioR commented 6 years ago

Is it showing the same behavior of #1104? If then let's merge https://github.com/hortonworks/streamline/pull/1120 to apply https://github.com/hortonworks/registry/pull/350 and try again to see actual error log.

If you can see the actual error in streamline log even now, please share.

shahsank3t commented 6 years ago

@HeartSaVioR I am not really aware of the behavior of #1104 so can't comment on that.

Actual error from streamline log file:

ERROR [11:20:30.616] [dw-3379 - POST /api/v1/catalog/cluster/import/ambari] c.h.s.s.s.GenericExceptionMapper - Got exception: [RuntimeException] / message [java.lang.RuntimeException: com.hortonworks.registries.storage.exception.StorageException: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction] java.lang.RuntimeException: java.lang.RuntimeException: com.hortonworks.registries.storage.exception.StorageException: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:593) at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735) at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at com.hortonworks.streamline.streams.cluster.ClusterImporter.handleServices(ClusterImporter.java:74) at com.hortonworks.streamline.streams.cluster.ClusterImporter.lambda$importCluster$0(ClusterImporter.java:65) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) Caused by: java.lang.RuntimeException: com.hortonworks.registries.storage.exception.StorageException: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction at com.hortonworks.streamline.streams.cluster.ClusterImporter.lambda$handleConfigurations$3(ClusterImporter.java:124) at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1691) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) at com.hortonworks.streamline.streams.cluster.ClusterImporter.handleConfigurations(ClusterImporter.java:110) at com.hortonworks.streamline.streams.cluster.ClusterImporter.lambda$handleServices$1(ClusterImporter.java:83) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291) at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) ... 4 common frames omitted

HeartSaVioR commented 6 years ago

Sorry I guess it is not related to. Btw I guess you are testing it against VPN or separated cluster: importing Ambari cluster should take some seconds and it shouldn't reach lock wait timeout.

HeartSaVioR commented 6 years ago

I'm able to reproduce issue. It looks OK when importing same cluster, but kind of stuck when refreshing cluster.

HeartSaVioR commented 6 years ago

Looks like it is odd transaction issue. I'm not expert on MySQL and I didn't encounter it earlier so may take time to address it.

---TRANSACTION 0, not started
MySQL thread id 60007, OS thread handle 0x7f485a464700, query id 1074574 localhost root
show engine innodb status
---TRANSACTION A899, ACTIVE 32 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 6 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 2
MySQL thread id 59971, OS thread handle 0x7f485a568700, query id 1074570 <host> <ip> streamline2 update
INSERT INTO service_configuration (`filename`, `configuration`, `name`, `description`, `id`, `serviceId`, `timestamp`) VALUES( null,'{"content":"\\n#!/bin/bash\\n\\n# Set Storm specific environment variables here.\\n\\n# The java implementation to use.\\nexport JAVA_HOME={{java64_home}}\\n\\nexport STORM_CONF_DIR={{conf_dir}}\\nexport STORM_HOME={{storm_component_home_dir}}\\nexport STORM_JAR_JVM_OPTS={{jar_jvm_opts}}\\n\\n#set storm-auto creds\\n# check if storm_jaas.conf in config , only enable storm_auto_creds in secure mode.\\nSTORM_JAAS_CONF=$STORM_HOME/conf/storm_jaas.conf\\nSTORM_AUTOCR
------- TRX HAS BEEN WAITING 32 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 958 n bits 96 index `fk_service_configuration_service` of table `streamline2`.`service_configuration` trx id A899 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

------------------
---TRANSACTION A898, ACTIVE 32 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 6 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 2
MySQL thread id 59970, OS thread handle 0x7f485a0d6700, query id 1074572 <host> <ip> streamline2 update
INSERT INTO service_configuration (`filename`, `configuration`, `name`, `description`, `id`, `serviceId`, `timestamp`) VALUES( null,'{"ambari_metrics_user":"ams","ams_classpath_additional":"","content":"\\n# Set environment variables here.\\n\\n# AMS instance name\\nexport AMS_INSTANCE_NAME={{hostname}}\\n\\n# The java implementation to use. Java 1.6 required.\\nexport JAVA_HOME={{java64_home}}\\n\\n# Collector Log directory for log4j\\nexport AMS_COLLECTOR_LOG_DIR={{ams_collector_log_dir}}\\n\\n# Monitor Log directory for outfile\\nexport AMS_MONITOR_LOG_DIR={{ams_monitor_log_dir}}\\n\\n# Col
------- TRX HAS BEEN WAITING 32 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 958 n bits 96 index `fk_service_configuration_service` of table `streamline2`.`service_configuration` trx id A898 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

------------------
---TRANSACTION A897, ACTIVE 32 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 6 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 2
MySQL thread id 59967, OS thread handle 0x7f4870153700, query id 1074569 <host> <ip> streamline2 update
INSERT INTO service_configuration (`filename`, `configuration`, `name`, `description`, `id`, `serviceId`, `timestamp`) VALUES( null,'{"auto.create.topics.enable":"true","auto.leader.rebalance.enable":"true","compression.type":"producer","controlled.shutdown.enable":"true","controlled.shutdown.max.retries":"3","controlled.shutdown.retry.backoff.ms":"5000","controller.message.queue.size":"10","controller.socket.timeout.ms":"30000","default.replication.factor":"1","delete.topic.enable":"false","external.kafka.metrics.exclude.prefix":"kafka.network.RequestMetrics,kafka.server.DelayedOperationPurga
------- TRX HAS BEEN WAITING 32 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 958 n bits 96 index `fk_service_configuration_service` of table `streamline2`.`service_configuration` trx id A897 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-insert-intention-locks

HeartSaVioR commented 6 years ago
SELECT straight_join
 w.trx_mysql_thread_id waiting_thread,
     w.trx_id waiting_trx_id,
     w.trx_query waiting_query,
     b.trx_mysql_thread_id blocking_thread,
     b.trx_id blocking_trx_id,
     b.trx_query blocking_query,
     bl.lock_id blocking_lock_id,
     bl.lock_mode blocking_lock_mode,
     bl.lock_type blocking_lock_type,
     bl.lock_table blocking_lock_table,
     bl.lock_index blocking_lock_index,
     wl.lock_id waiting_lock_id,
     wl.lock_mode waiting_lock_mode,
     wl.lock_type waiting_lock_type,
     wl.lock_table waiting_lock_table,
     wl.lock_index waiting_lock_index
 FROM
     information_schema.INNODB_LOCK_WAITS ilw ,
     information_schema.INNODB_TRX b , 
     information_schema.INNODB_TRX w , 
     information_schema.INNODB_LOCKS bl , 
     information_schema.INNODB_LOCKS wl
 WHERE
 b.trx_id = ilw.blocking_trx_id
     AND w.trx_id = ilw.requesting_trx_id
     AND bl.lock_id = ilw.blocking_lock_id
     AND wl.lock_id = ilw.requested_lock_id;

Similar result in show engine innodb status; is shown:

+----------------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-----------------+----------------+------------------+--------------------+--------------------+---------------------------------------+------------------------------------+-----------------+-------------------+-------------------+---------------------------------------+------------------------------------+
| waiting_thread | waiting_trx_id | waiting_query                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | blocking_thread | blocking_trx_id | blocking_query | blocking_lock_id | blocking_lock_mode | blocking_lock_type | blocking_lock_table                   | blocking_lock_index                | waiting_lock_id | waiting_lock_mode | waiting_lock_type | waiting_lock_table                    | waiting_lock_index                 |
+----------------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-----------------+----------------+------------------+--------------------+--------------------+---------------------------------------+------------------------------------+-----------------+-------------------+-------------------+---------------------------------------+------------------------------------+
|          60069 | A8A5           | INSERT INTO service_configuration (`filename`, `configuration`, `name`, `description`, `id`, `serviceId`, `timestamp`) VALUES( null,'{"ambari_metrics_user":"ams","ams_classpath_additional":"","content":"\\n# Set environment variables here.\\n\\n# AMS instance name\\nexport AMS_INSTANCE_NAME={{hostname}}\\n\\n# The java implementation to use. Java 1.6 required.\\nexport JAVA_HOME={{java64_home}}\\n\\n# Collector Log directory for log4j\\nexport AMS_COLLECTOR_LOG_DIR={{ams_collector_log_dir}}\\n\\n# Monitor Log directory for outfile\\nexport AMS_MONITOR_LOG_DIR={{ams_monitor_log_dir}}\\n\\n# Collector pid directory\\nexport AMS_COLLECTOR_PID_DIR={{ams_collector_pid_dir}}\\n\\n# Monitor pid directory\\nexport AMS_MONITOR_PID_DIR={{ams_monitor_pid_dir}}\\n\\n# AMS HBase pid directory\\nexport AMS_HBASE_PID_DIR={{hbase_pid_dir}}\\n\\n# AMS Collector heapsize\\nexport AMS_COLLECTOR_HEAPSIZE={{metrics_collector_heapsize}}\\n\\n# HBase Tables Initialization check enabled\\nexport AMS_HBASE_INIT_CHECK_ENABLED={{ams_hb |           60065 | A8A0            | NULL           | A8A0:0:958:1     | S                  | RECORD             | `streamline2`.`service_configuration` | `fk_service_configuration_service` | A8A5:0:958:1    | X                 | RECORD            | `streamline2`.`service_configuration` | `fk_service_configuration_service` |
|          60070 | A8A4           | INSERT INTO service_configuration (`filename`, `configuration`, `name`, `description`, `id`, `serviceId`, `timestamp`) VALUES( null,'{"auto.create.topics.enable":"true","auto.leader.rebalance.enable":"true","compression.type":"producer","controlled.shutdown.enable":"true","controlled.shutdown.max.retries":"3","controlled.shutdown.retry.backoff.ms":"5000","controller.message.queue.size":"10","controller.socket.timeout.ms":"30000","default.replication.factor":"1","delete.topic.enable":"false","external.kafka.metrics.exclude.prefix":"kafka.network.RequestMetrics,kafka.server.DelayedOperationPurgatory,kafka.server.BrokerTopicMetrics.BytesRejectedPerSec,kafka.server.KafkaServer.ClusterId","external.kafka.metrics.include.prefix":"kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetCommit.98percentile,kafka.network.RequestMetrics.ResponseQueueTimeMs.request.Offsets.95percentile,kafka.network.RequestMetrics.ResponseSendTimeMs.request.Fetch.95percentile,kafka.network.RequestMetrics.RequestsPerSec.request"," |           60065 | A8A0            | NULL           | A8A0:0:958:1     | S                  | RECORD             | `streamline2`.`service_configuration` | `fk_service_configuration_service` | A8A4:0:958:1    | X                 | RECORD            | `streamline2`.`service_configuration` | `fk_service_configuration_service` |
|          60066 | A8A3           | INSERT INTO service_configuration (`filename`, `configuration`, `name`, `description`, `id`, `serviceId`, `timestamp`) VALUES( null,'{"content":"\\n#!/bin/bash\\n\\n# Set Storm specific environment variables here.\\n\\n# The java implementation to use.\\nexport JAVA_HOME={{java64_home}}\\n\\nexport STORM_CONF_DIR={{conf_dir}}\\nexport STORM_HOME={{storm_component_home_dir}}\\nexport STORM_JAR_JVM_OPTS={{jar_jvm_opts}}\\n\\n#set storm-auto creds\\n# check if storm_jaas.conf in config , only enable storm_auto_creds in secure mode.\\nSTORM_JAAS_CONF=$STORM_HOME/conf/storm_jaas.conf\\nSTORM_AUTOCREDS_LIB_DIR=$STORM_HOME/external/storm-autocreds\\n\\nif [ -f $STORM_JAAS_CONF ] && [ -d $STORM_AUTOCREDS_LIB_DIR ]; then\\n    export STORM_EXT_CLASSPATH=$STORM_AUTOCREDS_LIB_DIR\\nfi","jmxremote_port":"56431","nimbus_seeds_supported":"true","storm.atlas.hook":"false","storm_log_dir":"/var/log/storm","storm_logs_supported":"true","storm_pid_dir":"/var/run/storm","storm_user":"storm","storm_user_nofile_limit":"128000","s |           60065 | A8A0            | NULL           | A8A0:0:958:1     | S                  | RECORD             | `streamline2`.`service_configuration` | `fk_service_configuration_service` | A8A3:0:958:1    | X                 | RECORD            | `streamline2`.`service_configuration` | `fk_service_configuration_service` |
|          60067 | A8A2           | INSERT INTO service_configuration (`filename`, `configuration`, `name`, `description`, `id`, `serviceId`, `timestamp`) VALUES( null,'{"content":"#!/bin/bash\\n# Licensed to the Apache Software Foundation (ASF) under one or more\\n# contributor license agreements. See the NOTICE file distributed with\\n# this work for additional information regarding copyright ownership.\\n# The ASF licenses this file to You under the Apache License, Version 2.0\\n# (the \\"License\\"); you may not use this file except in compliance with\\n# the License. You may obtain a copy of the License at\\n#\\n# http://www.apache.org/licenses/LICENSE-2.0\\n#\\n# Unless required by applicable law or agreed to in writing, software\\n# distributed under the License is distributed on an \\"AS IS\\" BASIS,\\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\\n# See the License for the specific language governing permissions and\\n# limitations under the License.\\n\\n# By default the script will use JAVA_HOME to deter |           60065 | A8A0            | NULL           | A8A0:0:958:1     | S                  | RECORD             | `streamline2`.`service_configuration` | `fk_service_configuration_service` | A8A2:0:958:1    | X                 | RECORD            | `streamline2`.`service_configuration` | `fk_service_configuration_service` |
|          60068 | A8A1           | INSERT INTO service_configuration (`filename`, `configuration`, `name`, `description`, `id`, `serviceId`, `timestamp`) VALUES( null,'{"autopurge.purgeInterval":"24","autopurge.snapRetainCount":"30","clientPort":"2181","dataDir":"/hadoop/zookeeper","initLimit":"10","syncLimit":"5","tickTime":"3000"}','zoo.cfg','',null,124,1515057252254)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |           60065 | A8A0            | NULL           | A8A0:0:958:1     | S                  | RECORD             | `streamline2`.`service_configuration` | `fk_service_configuration_service` | A8A1:0:958:1    | X                 | RECORD            | `streamline2`.`service_configuration` | `fk_service_configuration_service` |
+----------------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-----------------+----------------+------------------+--------------------+--------------------+---------------------------------------+------------------------------------+-----------------+-------------------+-------------------+---------------------------------------+------------------------------------+
HeartSaVioR commented 6 years ago

I guess I found the reason. Transaction which runs delete queries used for cleaning up before importing blocks inserting. Don't know about the detail though: why it blocks, and why the lock is presenting on FK index.

For PoC, I manually deleted the rows in service/service_configuration/component/component_process regarding cluster before refreshing, and do refresh: it works. (though it shows no services in UI... which is originated from backend API result. I have encountered some DAO cache inconsistent issue when I touch the DB manually so suspecting that, but it may be intended limitation, and after refreshing it shows properly so I don't mind for now)

Maybe we should make the logic sequential (so that it can be done within "a" transaction), or separate range of transaction to two places: cleaning up / importing.

HeartSaVioR commented 6 years ago

1133 addresses this via grouping DB operations for importing cluster into a thread (meaning a transaction).