Open ayee opened 11 years ago
Anthony
I saw the change. My suggestion will be to add S3 related code to the existing setConfiguration() method as below instead of a separate method. That way the job driver could call just one setConfiguration() method
This how I would like to integrate the change. Let me know if there is any issue. You could send another pull request. Or I could directly change the code
public static void setConfiguration(Configuration conf, String project) throws Exception{
boolean found = false;
String confFilePath = conf.get(CONF_FILE_PROP_NAME);
//user provided config file path
if (null != confFilePath){
if (confFilePath.startsWith(S3_PREFIX)) {
Matcher matcher = s3pattern.matcher(confFilePath);
matcher.matches();
String bucket = matcher.group(1);
String key = matcher.group(2);
S3Object object = s3.getObject(new GetObjectRequest(bucket, key));
is = object.getObjectContent();
Properties configProps = new Properties();
configProps.load(is);
for (Object key : configProps.keySet()){
String keySt = key.toString();
conf.set(keySt, configProps.getProperty(keySt));
}
} else if (confFilePath.startsWith(HDFS_PREFIX)) {
loadConfigHdfs( conf, confFilePath.substring(HDFS_PREFIX_LEN));
System.out.println("config found in user specified HDFS file");
} else {
loadConfig( conf, confFilePath, false);
System.out.println("config found in user specified FS file");
}
} else {
//default file system path
confFilePath = FS_DEF_CONFIG_DIR + project + PROP_FILE_EXT;
found = loadConfig( conf, confFilePath, true);
//default HDFS path
if (!found) {
confFilePath = HDFS_DEF_CONFIG_DIR + project + PROP_FILE_EXT;
loadConfigHdfs( conf, confFilePath);
System.out.println("config found in default HDFS location");
} else {
System.out.println("config found in default FS location");
}
}
}
Pranab, I didn't add the setConfiguration(Configuration) method. That was the one used and ran into an exception because of reco.properties in S3. I noticed setConfiguration(Configuration, String) was similar but since we didn't see errors I didn't change that. I can still go and make the similar modifications to this method if that's ok by you.
On Nov 2, 2012, at 11:05 PM, Pranab Ghosh notifications@github.com wrote:
Anthony
I saw the change. My suggestion will be to add S3 related code to the existing setConfiguration() method as below instead of a separate method. That way the job driver could call just one setConfiguration() method
This how I would like to integrate the change. Let me know if there is any issue. You could send another pull request. Or I could directly change the code
public static void setConfiguration(Configuration conf, String project) throws Exception{ boolean found = false; String confFilePath = conf.get(CONF_FILE_PROP_NAME);
//user provided config file path if (null != confFilePath){ if (confFilePath.startsWith(S3_PREFIX)) { Matcher matcher = s3pattern.matcher(confFilePath); matcher.matches(); String bucket = matcher.group(1); String key = matcher.group(2); S3Object object = s3.getObject(new GetObjectRequest(bucket, key)); is = object.getObjectContent(); Properties configProps = new Properties(); configProps.load(is); for (Object key : configProps.keySet()){ String keySt = key.toString(); conf.set(keySt, configProps.getProperty(keySt)); } } else if (confFilePath.startsWith(HDFS_PREFIX)) { loadConfigHdfs( conf, confFilePath.substring(HDFS_PREFIX_LEN)); System.out.println("config found in user specified HDFS file"); } else { loadConfig( conf, confFilePath, false); System.out.println("config found in user specified FS file"); } } else { //default file system path confFilePath = FS_DEF_CONFIG_DIR + project + PROP_FILE_EXT; found = loadConfig( conf, confFilePath, true); //default HDFS path if (!found) { confFilePath = HDFS_DEF_CONFIG_DIR + project + PROP_FILE_EXT; loadConfigHdfs( conf, confFilePath); System.out.println("config found in default HDFS location"); } else { System.out.println("config found in default FS location"); } }
} — Reply to this email directly or view it on GitHub.
Anthony
The method setConfiguration(Configuration) is deprecated. I will mark it so. Please add your code to the other setConfiguration() method as I suggested
I have made the changes and based on pull request and my suggestion and committed the changes
Hi Pranab, this is Anthony from Algorithms IO. In order for the job to run in the Amazon MapReduce, we had to allow the configuration to be loaded from S3. Added the code and thought it would be a good thing to contribute.