pranab / chombo

Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm
http://pkghosh.wordpress.com/
105 stars 73 forks source link

Added loading configuration from S3 #1

Open ayee opened 11 years ago

ayee commented 11 years ago

Hi Pranab, this is Anthony from Algorithms IO. In order for the job to run in the Amazon MapReduce, we had to allow the configuration to be loaded from S3. Added the code and thought it would be a good thing to contribute.

pranab commented 11 years ago

Anthony

I saw the change. My suggestion will be to add S3 related code to the existing setConfiguration() method as below instead of a separate method. That way the job driver could call just one setConfiguration() method

This how I would like to integrate the change. Let me know if there is any issue. You could send another pull request. Or I could directly change the code

public static void setConfiguration(Configuration conf, String project) throws Exception{
    boolean found = false;
    String confFilePath = conf.get(CONF_FILE_PROP_NAME);

    //user provided config file path
    if (null != confFilePath){
        if (confFilePath.startsWith(S3_PREFIX)) { 
            Matcher matcher = s3pattern.matcher(confFilePath);
            matcher.matches();
            String bucket = matcher.group(1);
            String key = matcher.group(2);
            S3Object object = s3.getObject(new GetObjectRequest(bucket, key));
            is = object.getObjectContent();
            Properties configProps = new Properties();
            configProps.load(is);

            for (Object key : configProps.keySet()){
                String keySt = key.toString();
                conf.set(keySt, configProps.getProperty(keySt));
            }
        } else if (confFilePath.startsWith(HDFS_PREFIX)) {
            loadConfigHdfs( conf,  confFilePath.substring(HDFS_PREFIX_LEN));
            System.out.println("config found in user specified HDFS file");
        } else {
            loadConfig( conf,  confFilePath, false);
            System.out.println("config found in user specified FS  file");
        }
     } else {
        //default file system path
        confFilePath = FS_DEF_CONFIG_DIR + project + PROP_FILE_EXT;
        found = loadConfig( conf,  confFilePath, true);

        //default HDFS path
        if (!found) {
            confFilePath = HDFS_DEF_CONFIG_DIR + project + PROP_FILE_EXT;
            loadConfigHdfs( conf,  confFilePath);
            System.out.println("config found in default HDFS location");
        }  else {
            System.out.println("config found in default FS location");
        }
     }
}
ayee commented 11 years ago

Pranab, I didn't add the setConfiguration(Configuration) method. That was the one used and ran into an exception because of reco.properties in S3. I noticed setConfiguration(Configuration, String) was similar but since we didn't see errors I didn't change that. I can still go and make the similar modifications to this method if that's ok by you.

On Nov 2, 2012, at 11:05 PM, Pranab Ghosh notifications@github.com wrote:

Anthony

I saw the change. My suggestion will be to add S3 related code to the existing setConfiguration() method as below instead of a separate method. That way the job driver could call just one setConfiguration() method

This how I would like to integrate the change. Let me know if there is any issue. You could send another pull request. Or I could directly change the code

public static void setConfiguration(Configuration conf, String project) throws Exception{ boolean found = false; String confFilePath = conf.get(CONF_FILE_PROP_NAME);

//user provided config file path
if (null != confFilePath){
    if (confFilePath.startsWith(S3_PREFIX)) { 
        Matcher matcher = s3pattern.matcher(confFilePath);
        matcher.matches();
        String bucket = matcher.group(1);
        String key = matcher.group(2);
        S3Object object = s3.getObject(new GetObjectRequest(bucket, key));
        is = object.getObjectContent();
        Properties configProps = new Properties();
        configProps.load(is);

        for (Object key : configProps.keySet()){
            String keySt = key.toString();
            conf.set(keySt, configProps.getProperty(keySt));
        }
    } else if (confFilePath.startsWith(HDFS_PREFIX)) {
        loadConfigHdfs( conf,  confFilePath.substring(HDFS_PREFIX_LEN));
        System.out.println("config found in user specified HDFS file");
    } else {
        loadConfig( conf,  confFilePath, false);
        System.out.println("config found in user specified FS  file");
    }
 } else {
    //default file system path
    confFilePath = FS_DEF_CONFIG_DIR + project + PROP_FILE_EXT;
    found = loadConfig( conf,  confFilePath, true);

    //default HDFS path
    if (!found) {
        confFilePath = HDFS_DEF_CONFIG_DIR + project + PROP_FILE_EXT;
        loadConfigHdfs( conf,  confFilePath);
        System.out.println("config found in default HDFS location");
    }  else {
        System.out.println("config found in default FS location");
    }
 }

} — Reply to this email directly or view it on GitHub.

pranab commented 11 years ago

Anthony

The method setConfiguration(Configuration) is deprecated. I will mark it so. Please add your code to the other setConfiguration() method as I suggested

pranab commented 11 years ago

I have made the changes and based on pull request and my suggestion and committed the changes