Open fengnanli opened 5 years ago
Thanks for reporting this @fengnanli ! I think I asked before but I don't remember your answer, was this running within a secure environment using LinuxContainerExecutor / cgroups? I think that is what prevents such things from occurring in our environment.
After running start-dynamometer-cluster.sh and replay the prod audit log for some time, some simulated datanodes (containers) lost connection to the RM and when the Yarn application is killed, these containers are still running, which will sending their blocks to the Namenode. In this case, since datanode has gone through some changes with the replay where Namenode started from a fresh fsimage. Below errors will show up in the webhdfs page after the Namenode starts up.
and checking datanode tab in the webhdfs page, a list of a couple datanodes will show up.