linkedin / dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
BSD 2-Clause "Simplified" License
131 stars 34 forks source link

some questions about the auditlog #93

Closed hexiangheng closed 5 years ago

hexiangheng commented 5 years ago

There have such a description in the read.me document. The audit trace replay accepts one input file per mapper, and currently supports two input formats, configurable via the auditreplay.command-parser.class configuration. Where do we need to configure the auditreplay.command-parser.class?

When using this format you must also specify auditreplay.log-start-time.ms and how should we specify the auditreplay.log-start-time.ms?

xkrogen commented 5 years ago

When launching the replay job directly, you can configure these by passing them as Hadoop configurations on the command line:

./bin/start-workload.sh
    -Dauditreplay.command-parser.class=...
    -Dauditreplay.log-start-time.ms=...

To support this from the integrated workload replay (start-dynamometer-cluster.sh) you can use the workload_config argument like:


./bin/start-dynamometer-cluster.sh
   -workload_config auditreplay.command-parser.class=...
   -workload_config auditreplay.log-start-time.ms=...
hexiangheng commented 5 years ago

thanks @xkrogen ,I have a problem when i use the workload replay auditlog, The replay command is as follows:

./bin/start-workload.sh 
-Dauditreplay.command-parser.class=com.linkedin.dynamometer.workloadgenerator
.audit.AuditLogDirectParser 
-Dauditreplay.log-start-time.ms=36033 
-Dauditreplay.input-path=hdfs:///dyno/audit.log/ 
-Dauditreplay.output-path=///dyno/out 
-Dauditreplay.num-threads=3 
-nn_uri hdfs://EC132:9002/ 
-start_time_offset 5m 
-mapper_class_name AuditReplayMapper

and The problem is the mapreduce progress has always remained at 0%,The details are as follows:

EC132:/home/hxh/hadoop/dynamometer-0.1.5 # ./bin/start-workload.sh -Dauditreplay.command-parser.class=com.linkedin.dynamometer.workloadgenerator.audit.AuditLogDirectParser -Dauditreplay.log-start-time.ms=36033 -Dauditreplay.input-path=hdfs:///dyno/audit.log/ -Dauditreplay.output-path=///dyno/out -Dauditreplay.num-threads=3 -nn_uri hdfs://EC132:9002/ -start_time_offset 5m -mapper_class_name AuditReplayMapper
2019-04-11 20:00:34,729 INFO workloadgenerator.WorkloadDriver: The workload will start at 1554984334712 ms (2019/04/11 20:05:34 CST)
2019-04-11 20:00:34,984 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-04-11 20:00:36,567 INFO input.FileInputFormat: Total input paths to process : 1
2019-04-11 20:00:36,756 INFO mapreduce.JobSubmitter: number of splits:1
2019-04-11 20:00:36,940 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554983803645_0002
2019-04-11 20:00:37,340 INFO impl.YarnClientImpl: Submitted application application_1554983803645_0002
2019-04-11 20:00:37,379 INFO mapreduce.Job: The url to track the job: http://EC132:8088/proxy/application_1554983803645_0002/
2019-04-11 20:00:37,379 INFO mapreduce.Job: Running job: job_1554983803645_0002
2019-04-11 20:00:45,498 INFO mapreduce.Job: Job job_1554983803645_0002 running in uber mode : false
2019-04-11 20:00:45,500 INFO mapreduce.Job:  map 0% reduce 0%

and the mapreduce.Job logs only have a small amount of information.

hexiangheng commented 5 years ago

The nodemanager log is :

2019-04-16 20:37:45,719 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 14134 for container-id container_e109_1555417499409_0001_01_000001: 204.5 MB of 2 GB physical memory used; 3.3 GB of 10 GB virtual memory used
2019-04-16 20:37:45,734 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 14424 for container-id container_e109_1555417499409_0002_01_000002: 276.5 MB of 1 GB physical memory used; 2.6 GB of 5 GB virtual memory used

The memory configuration does not seem to be abnormal.

hexiangheng commented 5 years ago

@xkrogen ,@csgregorian ,Please take a look at this problem. When I deleted some records from the auditlog and kept only a few records,and execute the workload replay auditlog command,it will failed, The errors are as follows:

2019-04-25 22:11:55,739 INFO workloadgenerator.WorkloadDriver: The workload will start at 1556201695729 ms (2019/04/25 22:14:55 CST)
2019-04-25 22:11:56,007 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-04-25 22:11:58,173 INFO input.FileInputFormat: Total input paths to process : 1
2019-04-25 22:11:58,593 INFO mapreduce.JobSubmitter: number of splits:1
2019-04-25 22:11:58,976 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1556196809972_0003
2019-04-25 22:11:59,409 INFO impl.YarnClientImpl: Submitted application application_1556196809972_0003
2019-04-25 22:11:59,466 INFO mapreduce.Job: The url to track the job: http://EC130:8088/proxy/application_1556196809972_0003/
2019-04-25 22:11:59,467 INFO mapreduce.Job: Running job: job_1556196809972_0003
2019-04-25 22:12:09,611 INFO mapreduce.Job: Job job_1556196809972_0003 running in uber mode : false
2019-04-25 22:12:09,612 INFO mapreduce.Job:  map 0% reduce 0%
2019-04-25 22:12:20,773 INFO mapreduce.Job:  map 22% reduce 0%
2019-04-25 22:14:56,093 INFO mapreduce.Job: Task Id : attempt_1556196809972_0003_m_000000_0, Status : FAILED
Error: java.lang.IllegalArgumentException: Chunk [allowed=true    ugi=root (auth:SIMPLE)  ip=/10.120.155.130      cmd=getfileinfo src=/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1556164126996_0001_conf.xml_tmp dst=null        perm=null       proto=rpc] is not a valid entry
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
        at com.google.common.base.Splitter$MapSplitter.split(Splitter.java:456)
        at com.linkedin.dynamometer.workloadgenerator.audit.AuditLogDirectParser.parse(AuditLogDirectParser.java:65)
        at com.linkedin.dynamometer.workloadgenerator.audit.AuditReplayMapper.map(AuditReplayMapper.java:218)
        at com.linkedin.dynamometer.workloadgenerator.audit.AuditReplayMapper.map(AuditReplayMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2019-04-25 22:14:57,134 INFO mapreduce.Job:  map 0% reduce 0%
2019-04-25 22:15:01,173 INFO mapreduce.Job: Task Id : attempt_1556196809972_0003_m_000000_1, Status : FAILED
Error: java.lang.IllegalArgumentException: Chunk [allowed=true    ugi=root (auth:SIMPLE)  ip=/10.120.155.130      cmd=getfileinfo src=/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1556164126996_0001_conf.xml_tmp dst=null        perm=null       proto=rpc] is not a valid entry
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
        at com.google.common.base.Splitter$MapSplitter.split(Splitter.java:456)
        at com.linkedin.dynamometer.workloadgenerator.audit.AuditLogDirectParser.parse(AuditLogDirectParser.java:65)
        at com.linkedin.dynamometer.workloadgenerator.audit.AuditReplayMapper.map(AuditReplayMapper.java:218)
        at com.linkedin.dynamometer.workloadgenerator.audit.AuditReplayMapper.map(AuditReplayMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2019-04-25 22:15:06,213 INFO mapreduce.Job: Task Id : attempt_1556196809972_0003_m_000000_2, Status : FAILED
Error: java.lang.IllegalArgumentException: Chunk [allowed=true    ugi=root (auth:SIMPLE)  ip=/10.120.155.130      cmd=getfileinfo src=/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1556164126996_0001_conf.xml_tmp dst=null        perm=null       proto=rpc] is not a valid entry
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
        at com.google.common.base.Splitter$MapSplitter.split(Splitter.java:456)
        at com.linkedin.dynamometer.workloadgenerator.audit.AuditLogDirectParser.parse(AuditLogDirectParser.java:65)
        at com.linkedin.dynamometer.workloadgenerator.audit.AuditReplayMapper.map(AuditReplayMapper.java:218)
        at com.linkedin.dynamometer.workloadgenerator.audit.AuditReplayMapper.map(AuditReplayMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2019-04-25 22:15:13,260 INFO mapreduce.Job:  map 100% reduce 100%
2019-04-25 22:15:15,297 INFO mapreduce.Job: Job job_1556196809972_0003 failed with state FAILED due to: Task failed task_1556196809972_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

2019-04-25 22:15:15,443 INFO mapreduce.Job: Counters: 13
        Job Counters
                Failed map tasks=4
                Killed reduce tasks=1
                Launched map tasks=4
                Other local map tasks=3
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=176267
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=176267
                Total time spent by all reduce tasks (ms)=0
                Total vcore-milliseconds taken by all map tasks=176267
                Total vcore-milliseconds taken by all reduce tasks=0
                Total megabyte-milliseconds taken by all map tasks=90248704
                Total megabyte-milliseconds taken by all reduce tasks=0
xkrogen commented 5 years ago

He @hexiangheng , sorry for the delay in my response. When you edited the file, did your editor replace tabs (\t) with spaces ( )? The parser expects there to be tabs, which are the separator that is included in the original audit log file.

hexiangheng commented 5 years ago

Thanks @xkrogen ,when i use the workload replay auditlog file,I didn't change the auditlog format. I just intercepted part of the original audit log file.My auditlog file is as follows:

2019-05-05 12:51:58,051 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    ugi=root (auth:SIMPLE)  ip=/10.120.155.130      cmd=getfileinfo src=/dyno/audit.log._COPYING_   dst=null        perm=null       proto=rpc
2019-05-05 12:51:58,209 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    ugi=root (auth:SIMPLE)  ip=/10.120.155.130      cmd=create      src=/dyno/audit.log._COPYING_   dst=null        perm=root:supergroup:rw-r--r--  proto=rpc
2019-05-05 12:51:58,244 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    ugi=root (auth:SIMPLE)  ip=/10.120.155.130      cmd=getfileinfo src=/dyno/audit.log._COPYING_   dst=null        perm=null       proto=rpc
2019-05-05 12:51:59,183 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    ugi=root (auth:SIMPLE)  ip=/10.120.155.130      cmd=rename      src=/dyno/audit.log._COPYING_   dst=/dyno/audit.log     perm=root:supergroup:rw-r--r--  proto=rpc
2019-05-05 12:52:32,177 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    ugi=root (auth:SIMPLE)  ip=/10.120.155.130      cmd=listStatus  src=/tmp/hadoop-yarn/staging/history/done_intermediate  dst=null        perm=null       proto=rpc
2019-05-05 12:52:36,449 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    ugi=root (auth:SIMPLE)  ip=/10.120.155.131      cmd=listStatus  src=/tmp/hadoop-yarn/staging/history/done_intermediate  dst=null        perm=null       proto=rpc
xkrogen commented 5 years ago

The text appearing in your comment makes use of spaces instead of tabs. I'm not sure if this conversion happened within your local file, or when you copied it onto GitHub. Can you confirm that the file contains tabs and not spaces? Maybe you can directly upload a copy of the file for inspection instead of copying the text?

xkrogen commented 5 years ago

@hexiangheng is this still an issue for you?

hexiangheng commented 5 years ago

@xkrogen ,Thanks and i'm sorry,This problem has been solved.

ArthurSXL8 commented 3 years ago

@xkrogen ,Thanks and i'm sorry,This problem has been solved.

How do you resolve it? I encounter the same problem