paypal / NNAnalytics

NameNodeAnalytics is a self-help utility for scouting and maintaining the namespace of an HDFS instance.
Apache License 2.0
110 stars 71 forks source link

Security is enabled but block access tokens (via dfs.block.access.token.enable) aren't enabled #299

Closed silvermissile closed 2 years ago

silvermissile commented 4 years ago

hadoop version: Hadoop 3.0.0-cdh6.3.2. secured with kerbers. nna version:nn-analytics-1.6.6.3.0.0

NameNodeLoader overwrite some hdfs Settings include " dfs.block.access.token.enable to: false" ,and this lead to Suggestion reload failed!

[main] INFO org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - Starting with QueryEngine implementation: org.apache.hadoop.hdfs.server.namenode.JavaStreamQueryEngine
[main] INFO org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - Setting: dfs.block.access.token.enable to: false
[main] INFO org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - Setting: dfs.ha.log-roll.period to: -1
[main] INFO org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - Setting: dfs.ha.standby.checkpoints to: false
[main] INFO org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - Setting: dfs.content-summary.limit to: 0
[main] INFO org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - Setting: dfs.namenode.name.dir to: /usr/local/nn-analytics/dfs/name
[main] INFO org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - Unsetting: dfs.namenode.inode.attributes.provider.class
[main] INFO org.apache.hadoop.security.UserGroupInformation - Login successful for user hdfs/cdhmanager202.dataplat.aku@DATAPLAT.AKU using keytab file /usr/local/nn-analytics/config/nn.service.keytab
[main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[main] INFO org.apache.hadoop.security.UserGroupInformation - Login successful for user hdfs/cdhmanager202.dataplat.aku@DATAPLAT.AKU using keytab file /usr/local/nn-analytics/config/nn.service.keytab
[main] INFO org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - FSImage auto-fetched in: 1318 ms.
[main] INFO org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - Loading with configuration: Configuration: core-default.xml, core-site.xml, hdfs-default.xml, hdfs-site.xml, hdfs-default.xml, hdfs-site.xml
[main] INFO org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - FileSystem seen as: hdfs://xdatahdfs
[main] INFO org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - Loading image from: /usr/local/nn-analytics/dfs/name
[main] INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager - dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
[main] INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager - The block deletion will start around 2020 Jul 21 08:16:27
[main] INFO org.apache.hadoop.util.GSet - Computing capacity for map BlocksMap
[main] INFO org.apache.hadoop.util.GSet - VM type       = 64-bit
[main] INFO org.apache.hadoop.util.GSet - 2.0% max memory 2.4 GB = 48.4 MB
[main] INFO org.apache.hadoop.util.GSet - capacity      = 2^23 = 8388608 entries
[main] INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager - dfs.block.access.token.enable = false
[main] ERROR org.apache.hadoop.hdfs.server.namenode.NameNodeLoader - Failed to load namesystem.
java.io.IOException: Security is enabled but block access tokens (via dfs.block.access.token.enable) aren't enabled. This may cause issues when clients attempt to connect to a DataNode. Aborting NameNode
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createBlockTokenSecretManager(BlockManager.java:590)
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:468)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:781)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:696)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeLoader.load(NameNodeLoader.java:361)
    at org.apache.hadoop.hdfs.server.namenode.analytics.WebServerMain.init(WebServerMain.java:2003)
    at org.apache.hadoop.hdfs.server.namenode.analytics.WebServerMain.init(WebServerMain.java:190)
    at org.apache.hadoop.hdfs.server.namenode.analytics.WebServerMain.init(WebServerMain.java:183)
    at org.apache.hadoop.hdfs.server.namenode.analytics.WebServerMain.main(WebServerMain.java:169)
[pool-2-thread-1] INFO org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader - Suggestion reload failed!
java.lang.NullPointerException
    at org.apache.hadoop.hdfs.server.namenode.AbstractQueryEngine.getINodeSet(AbstractQueryEngine.java:126)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeLoader.getINodeSet(NameNodeLoader.java:544)
    at org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsEngine.reloadSuggestions(SuggestionsEngine.java:138)
    at org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader.run(SuggestionsReloader.java:63)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[pool-2-thread-1] INFO org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader - org.apache.hadoop.hdfs.server.namenode.AbstractQueryEngine.getINodeSet(AbstractQueryEngine.java:126)
[pool-2-thread-1] INFO org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader - org.apache.hadoop.hdfs.server.namenode.NameNodeLoader.getINodeSet(NameNodeLoader.java:544)
[pool-2-thread-1] INFO org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader - org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsEngine.reloadSuggestions(SuggestionsEngine.java:138)
[pool-2-thread-1] INFO org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader - org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader.run(SuggestionsReloader.java:63)
[pool-2-thread-1] INFO org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader - java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[pool-2-thread-1] INFO org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader - java.util.concurrent.FutureTask.run(FutureTask.java:266)
[pool-2-thread-1] INFO org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader - java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[pool-2-thread-1] INFO org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader - java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[pool-2-thread-1] INFO org.apache.hadoop.hdfs.server.namenode.cache.SuggestionsReloader - java.lang.Thread.run(Thread.java:748)
silvermissile commented 4 years ago

dfs.block.access.token.enable should default on when security is !simple

pjeli commented 3 years ago

NNA tries to disable the block tokens on itself (just itself...) in order to make sure your production cluster is untouched by NNA. Technically speaking, as long as NNA does not communicate with your DataNodes there is no issue, but this was always done as a preventive measure.

If you want to take NNA configuration into your own hands this was always possible. Look here:

https://nnanalytics.readthedocs.io/en/latest/Getting_Started/How_To_Configure/

nna.support.bootstrap.overrides=<true | false> - Default is true. True will override certain hdfs-site.xml configurations to prevent NNA from communicating with the active cluster. False means it will use configurations as-is. Recommended true in production.

The above setting is part of the application.properties file.

pjeli commented 2 years ago

I just realized - this only happens at the very start of NNA. The error isn't actually because of the block tokens - those are fine. The real issue is that the INode set is not initialized - it is null at the start. It will get initialized when the FsImage is loaded. So this issue should just be temporary.