twitter / hraven

hRaven collects run time data and statistics from MapReduce jobs in an easily queryable format
https://twitter.com/twitterhadoop
Apache License 2.0
127 stars 77 forks source link

jobFileProcessor.sh complains about missing arguments. #87

Open hcoyote opened 10 years ago

hcoyote commented 10 years ago

Running from origin/master.

I have things patched up enough to get the jobFilePreprocessor.sh and jobFileLoader.sh connecting to our Hadoop environment. The last step in hraven-etl.sh invokes jobFileProcessor.sh, but this throws errors about missing arguments.

I poked around in the code and it's not really clear what these should be. machinetype appears like it should be set to "default" if not explicitly set, but the arg processor makes this argument required. Additionally, I can't find a great deal of discussion on what's supposed to be in the cost properties.

ERROR: Missing required options: z, m

usage: JobFileProcessor  [-b <batch-size>] -c <cluster> [-d] -m
       <machinetype> [-p <processFileSubstring>] [-r] [-t <thread-count>]
       -z <costfile>
 -b,--batchSize <batch-size>                        The number of files to
                                                    process in one batch.
                                                    Default 100
 -c,--cluster <cluster>                             cluster for which jobs
                                                    are processed
 -d,--debug                                         switch on DEBUG log
                                                    level
 -m,--machineType <machinetype>                     The type of machine
                                                    this job ran on
  -p,--processFileSubstring <processFileSubstring>   use only those process
                                                     records where the
                                                     process file path
                                                     contains the provided
                                                     string. Useful when
                                                     processing production
                                                     jobs in parallel to
                                                     historic loads.
  -r,--reprocess                                     Reprocess only those
                                                     records that have been
                                                     marked to be
                                                     reprocessed. Otherwise
                                                     process all rows
                                                     indicated in the
                                                     processing records,
                                                     but successfully
                                                     processed job files
                                                     are skipped.
  -t,--threads <thread-count>                        Number of parallel
                                                     threads to use to run
                                                     Hadoop jobs
                                                     simultaniously.
                                                     Default = 1
  -z,--costFile <costfile>                           The cost properties
                                                     file on local disk
vrushalic commented 10 years ago

Hi Travis,

Yes, I can see that the jobFilePreprocessor.sh was not updated. Give me a few mins to update it now. I will add some more documentation to a sample cost file in the conf dir. The job cost will be stored as a column in hbase.

The cost properties file could even be empty file, since it simply won't calculate the cost then.

vrushalic commented 10 years ago

Updated the script and added a sample file. Please give this a try and let me know.

hcoyote commented 10 years ago

Thanks, I'll see if I can get it working tomorrow.

vrushalic commented 10 years ago

Hi, Did this work for you?

thanks Vrushali