WeatherPipe in the latest aws #5

Open aloukianov opened 6 years ago

aloukianov commented 6 years ago

Hi, would you tell me if you have a chance to support the project yet? Kind regards, andrei

PS. I'm trying to write my own analysis in Eclipse Oxygen and have an issue with "hadoop home". Please find the error message: 1 [main] DEBUG org.apache.hadoop.util.Shell - Failed to detect a valid hadoop home directory HADOOP_HOME or hadoop.home.dir are not set. at org.apache.hadoop.util.Shell.checkHadoopHome( at org.apache.hadoop.util.Shell.( at org.apache.hadoop.util.StringUtils.( at org.apache.hadoop.conf.Configuration.setStrings( at edu.purdue.eaps.weatherpipe.weatherpipemapreduce.WeatherPipeMapReduce.main( 11 [main] ERROR org.apache.hadoop.util.Shell - Failed to locate the winutils binary in the hadoop binary path Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath( at org.apache.hadoop.util.Shell.getWinUtilsPath( at org.apache.hadoop.util.Shell.( at org.apache.hadoop.util.StringUtils.( at org.apache.hadoop.conf.Configuration.setStrings( at edu.purdue.eaps.weatherpipe.weatherpipemapreduce.WeatherPipeMapReduce.main( 673 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, always=false, sampleName=Ops, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)], valueName=Time) 686 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, always=false, sampleName=Ops, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)], valueName=Time) 686 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, always=false, sampleName=Ops, type=DEFAULT, value=[GetGroups], valueName=Time) 688 [main] DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics 786 [main] DEBUG - Kerberos krb5 configuration not found, setting default realm to empty 790 [main] DEBUG - Creating new Groups object 794 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... 797 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 797 [main] DEBUG org.apache.hadoop.util.NativeCodeLoader - java.library.path=C:\Program Files\Java\jre1.8.0_161\bin;C:\WINDOWS\Sun\Java\bin;C:\WINDOWS\system32;C:\WINDOWS;C:/Program Files/Java/jre1.8.0_161/bin/server;C:/Program Files/Java/jre1.8.0_161/bin;C:/Program Files/Java/jre1.8.0_161/lib/amd64;C:\Program Files\Microsoft MPI\Bin\;C:\ProgramData\Oracle\Java\javapath;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program Files\Common Files\Autodesk Shared\;C:\Python35\Scripts;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files\dotnet\;C:\Program Files\Anaconda3;C:\Program Files\Anaconda3\Scripts;C:\Program Files\Anaconda3\Library\bin;C:\Program Files\Java\jdk1.8.0_121\bin;C:\tools\apache-maven-3.5.2\bin;C:\WINDOWS\system32\config\systemprofile\MawsonKeyStorage;C:\tools\apache-maven-3.5.2\bin;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\;C:\Program Files\Microsoft SQL Server\140\Tools\Binn\;C:\Program Files\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio\;C:\Program Files\Amazon\AWSCLI\;C:\Gradle\gradle-4.6\bin;C:\Users\papa\AppData\Local\Programs\Python\Python36\Scripts\;C:\Users\papa\AppData\Local\Programs\Python\Python36\;C:\Users\papa\AppData\Local\Microsoft\WindowsApps;;C:\Users\papa\Downloads\eclipse-jee-oxygen-3-win32-x86_64\eclipse;;. 797 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 797 [main] DEBUG - Falling back to shell based 797 [main] DEBUG - Group mapping 798 [main] DEBUG - Group mapping; cacheTimeout=300000; warningDeltaMs=5000 805 [main] DEBUG - hadoop login 805 [main] DEBUG - hadoop login commit 811 [main] DEBUG - using local user:NTUserPrincipal: papa 812 [main] DEBUG - UGI loginUser:papa (auth:SIMPLE) Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0 at edu.purdue.eaps.weatherpipe.weatherpipemapreduce.WeatherPipeMapReduce.main(

stephenlienharrell commented 6 years ago

Hello Andrei, I haven't ever attempted to run this on Windows, however, it looks like you have not set your hadoop home. From this page there is an example of how to set it:

System.setProperty("hadoop.home.dir", "C:\winutil\"); reference : – Himanshu Bhandari Jan 6 '16 at 7:01

Hope this helps, that page probably has a lot of good information about running Hadoop on Windows.

Good luck! -stephen

aloukianov commented 6 years ago

Hi Stephen,

Thank you for the ref (not resolved yet). I got all sort of things with Win 10, Java 6 Hadoop, Gradle 5, Maven rep, Java 9, Eclipse Oxigen Java 8 etc. There is an error on log4j which I can't figure out. It seems the error related to a previous version of the log4j.

Could you find a minute to advise?

Kind regards Andrei log4j:ERROR Could not read configuration file [/C:/WeatherEclipse/WeatherPipe/WeatherPipeMapReduce/bin/main/]. C:\WeatherEclipse\WeatherPipe\WeatherPipeMapReduce\bin\main\ (The system cannot find the file specified) at Method) at Source) at Source) at Source) at org.apache.log4j.PropertyConfigurator.doConfigure( at org.apache.log4j.PropertyConfigurator.configure( at edu.purdue.eaps.weatherpipe.AWSAnonInterface.( at edu.purdue.eaps.weatherpipe.WeatherPipe.( log4j:ERROR Ignoring configuration file [/C:/WeatherEclipse/WeatherPipe/WeatherPipeMapReduce/bin/main/]. log4j:WARN No appenders could be found for logger (com.amazonaws.AmazonWebServiceClient). log4j:WARN Please initialize the log4j system properly. log4j:WARN See for more info. ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Missing Option Error: org.apache.commons.cli.MissingOptionException: start_time is a required flag or setting in the config file usage: WeatherPipe [-b ] [-c ] [-e ] [-h] [-i ] [-id

] [-s ] [-st ] [-t ] -b,--bucket_name Bucket name in S3 to place input and output data. Will be auto-generated if not given -c,--config_file Location of config file -e,--end_time End search boundary for NEXRAD data search. Date Format is dd/MM/yyyy HH:mm:ss -h,--help Print this help message -i,--instance_count The amount of instances to run the analysis on. Default is 1. -id,--job_id Name of this particular job, a random one will be generated if not given. This must be unique in reference to other jobs. -s,--start_time Start search boundary for NEXRAD data search. Date Format is dd/MM/yyyy HH:mm:ss -st,--station Radar station abbreviation ex. "KIND" -t,--instance_type Instance type for EMR job. Default is c3.xlarge. See options here: ng/ Please report issues at