Closed xstevens closed 11 years ago
Help, I have this too.
This seems to be a problem with using the Hadoop cache for configuration.
This is still an issue in trunk, also across Pig versions (I've tried with Pig 0.8, 0.9, 0.10, and trunk). Issue seems to be that the Configuration object passed to ElasticSearchStorage.setStoreLocation (in elasticSearchSetup) is a copy of the one in JobControlCompiler.getJob. Thus nothing added to the Configuration object in setStoreLocation, including mapred.cache.files, makes it into the Job that Pig submits. And thus elasticsearch.yml is not localized, and the job fails with a:
ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate exception from backed error: org.elasticsearch.env.FailedToResolveConfigException: Failed to resolve config path [/home/evert/Downloads/elasticsearch/config/elasticsearch.yml], tried file path [/home/evert/Downloads/elasticsearch/config/elasticsearch.yml], path file [/disk1/mapred.local.dir/taskTracker/evert/jobcache/job_201208092125_0091/attempt_201208092125_0091_m_000000_3/work/config/home/evert/Downloads/elasticsearch/config/elasticsearch.yml], and classpath
at org.elasticsearch.env.Environment.resolveConfig(Environment.java:205)
at org.elasticsearch.node.internal.InternalSettingsPerparer.prepareSettings(InternalSettingsPerparer.java:62)
at org.elasticsearch.node.internal.InternalNode.<init>(InternalNode.java:112)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166)
at com.infochimps.elasticsearch.ElasticSearchOutputFormat$ElasticSearchRecordWriter.start_embedded_client(ElasticSearchOutputFormat.java:258)
at com.infochimps.elasticsearch.ElasticSearchO
Adding stuff to the Configuration might be wrong usage of the StoreFunchInterface, I don't know Pig well enough to know for sure. I've filed a Pig Jira: https://issues.apache.org/jira/browse/PIG-2872
I'm submitting to a Hadoop 0.20.205.0 cluster with Kerberos enabled. I've changed version a lot in trying to figure this out, but my latest setup was Pig's trunk (0.11), Elasticsearch trunk, and wonderdog trunk.
I use ElasticSearchStorage fine with Pig 0.9.2. I recently unpacked the Pig 0.10 tarball and it seems to have broken the storage class. The configuration variables aren't getting persisted in the job.xml for some reason. I'm not sure yet if this is a Pig 0.10 bug or if its just a compatibility breaking change.