Closed kenttanl closed 3 years ago
检查
sudo -u hbase hbase hbck -details 'table_name' | tee /tmp/kent.hbase.02192.log
cat /tmp/kent.hbase.02192.log | grep -i error
修复
注:修复后需要再次确认,如果没有一致,则继续修复
# 错误信息
ERROR: Empty REGIONINFO_QUALIFIER found in hbase:meta
ERROR: Region { meta => null, hdfs => hdfs://nameservice1/hbase/data/default/table_name/e40656c062b18495b6086c5ca54baea7, deployed => e0ecmrhdp0,60020,1550775277680;table_name,3b0449eadf7378fb154e71ab084301c3fb603b451366069d00d963460b1982e7,1528423666709.e40656c062b18495b6086c5ca54baea7., replicaId => 0 } not in META, but deployed on e4ecmrhdp20.mercury.corp,60020,1550775277680
# 修复命令
sudo -u hbase hbase hbck -details 'table_name' -repair
# 修复后固化下该修复
flush 'table_name'
# 错误信息
ERROR: Table table_name not found in hbase:meta. Orphaned table ZNode found.
# 修复命令
sudo -u hbase hbase hbck -fixOrphanedTableZnodes
# 错误信息
ERROR: Region { meta => null, hdfs => hdfs://nameservice1/hbase/data/ecitem/table_name/35a4cb54b876ffd8743fd763a6b305d4, deployed => , replicaId => 0 } on HDFS, but not listed in hbase:meta or deployed on any region server
ERROR: There is a hole in the region chain between 417e38554344f3ae62e51cb64cc047e37794f1a8a040f5b8298b7b5d28cde1d2 and 46765edfa473f52525ab8f8cd9ff5a8baaaad0b55af6929cf6b852c1d5903b26. You need to create a new .regioninfo and region dir in hdfs to plug the hole.
# 修复命令
sudo -u hbase hbase hbck -fixEmptyMetaCells -details 'table_name'
一个 master 和两个 regionserver 状态都处于 stopped
GC 时间太久,查看 master log:
2019-07-31 18:26:32,042 WARN org.apache.hadoop.hbase.ipc.RpcServer: (responseTooSlow): {"processingtimems":3718,"call":"GetCompletedSnapshots(org.apache.hadoop.hbase.protobuf.ge
nerated.MasterProtos$GetCompletedSnapshotsRequest)","client":"172.16.59.15:56694","starttimems":1564622788323,"queuetimems":0,"class":"HMaster","responsesize":7388,"method":"Get
CompletedSnapshots"}
原因:snapshot 会被定期维护,把一些历史的 snapshot 删除后,此问题解决
逻辑可以分为四种:
套路:是指我们解决问题的方法论、路径和经验。比如,5W2H分析法,Why、Who、When、Where、What、How和How much。任何问题都可以从这七个方面思考。
最清晰和实用的结构化表达是“提出问题,定义问题,分析问题,解决问题,最后展望未来”。
另一个有用的思维框架是“zoom in/zoom out”
-XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=75 -XX:ConcGCThreads=16 -XX:ParallelGCThreads=23 -XX:-ResizePLAB -XX:+ParallelRefProcEnabled -XX:+PrintAdaptiveSizePolicy -XX:+PrintFlagsFinal -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=10021 -verbose:gc -verbose:sizes -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Xloggc:/var/log/hbase/hregion-gc.log
-XX:+UseG1GC -XX:MaxNewSize=256m -XX:NewSize=120m -XX:MaxGCPauseMillis=100 -XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=75 -XX:ConcGCThreads=16 -XX:ParallelGCThreads=23 -XX:-ResizePLAB -XX:+ParallelRefProcEnabled -XX:+PrintAdaptiveSizePolicy -XX:+PrintFlagsFinal -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=10021 -verbose:gc -verbose:sizes -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -Xloggc:/var/log/hbase/hmaster-gc.log
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
-XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Djava.net.preferIPv4Stack=true
Bye Newegg.