kgneng2 / blokg

blog
MIT License
0 stars 0 forks source link

CDH NameNode Shutdown #2

Open kgneng2 opened 4 years ago

kgneng2 commented 4 years ago

증상

2017-05-25 11:12:21,218 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: registered UNIX signal handlers for [TERM, HUP, INT]
2017-05-25 11:12:21,440 WARN org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-journalnode.properties,hadoop-metrics2.properties
2017-05-25 11:12:21,506 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-05-25 11:12:21,506 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: JournalNode metrics system started
2017-05-25 11:12:21,715 INFO org.apache.hadoop.hdfs.DFSUtil: Starting Web-server for journal at: http://xcv020.sa.nhnsystem.com:8480
2017-05-25 11:12:21,756 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2017-05-25 11:12:21,762 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2017-05-25 11:12:21,768 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.journal is not defined
2017-05-25 11:12:21,802 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2017-05-25 11:12:21,804 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context journal
2017-05-25 11:12:21,804 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2017-05-25 11:12:21,804 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2017-05-25 11:12:21,819 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8480
2017-05-25 11:12:21,819 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2017-05-25 11:12:22,014 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@xcv020.sa.nhnsystem.com:8480
2017-05-25 11:12:22,045 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2017-05-25 11:12:22,052 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8485
2017-05-25 11:12:22,066 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2017-05-25 11:12:22,066 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8485: starting

원인

2017-05-29 22:56:53,275 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: Initializing journal in directory /data1/dfs/jn/NCC
2017-05-29 22:56:53,320 WARN org.apache.hadoop.hdfs.server.common.Storage: Storage directory /data1/dfs/jn/NCC does not exist
2017-05-29 22:56:53,349 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal Storage Directory /data1/dfs/jn/NCC not formatted
2017-05-29 22:56:53,349 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.getEditLogManifest from 10.96.16.85:58357 Call#0 Retry#0
org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal Storage Directory /data1/dfs/jn/NCC not formatted
        at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:472)
        at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:655)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:186)
        at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:236)
        at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25431)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

해결

Journal Node란

부여 실패 이유

#Tue May 30 14:57:13 KST 2017 VERSION FILE
namespaceID=1190465890
clusterID=CID-b428f794-25ad-4e6b-bbf9-832cac66d61e
cTime=0
storageType=JOURNAL_NODE
layoutVersion=-60

Namenode 장애로 서비스 영향 및 복구 방법

서비스 영향

복구 방법

Journal Node 역할 부여 GUIDE

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_mc_jn.html 위의 링크는 같은 노드에서 journalnode edit dir 옮기는거지만 다른 서버로 옮길때는 이전 editlog들은 옮기기 힘드니 VERSION 파일만 옮기고 시행하면 될듯합니다. (추후 테스트존에서 테스트 후 다시 정리)


testzone에서 진행해본 결과 journal node를 옮기면 namenode를 재시작해야 정상 동작함.... journal노드는 옮기고 정상 동작하려면 namenode를 재시작해야 된다..