infochimps-labs / wonderdog

Bulk loading for elastic search
http://infochimps.com
Apache License 2.0
186 stars 56 forks source link

ElasticSearchStorage doesn't work with Pig 0.10 #5

Closed xstevens closed 11 years ago

xstevens commented 12 years ago

I use ElasticSearchStorage fine with Pig 0.9.2. I recently unpacked the Pig 0.10 tarball and it seems to have broken the storage class. The configuration variables aren't getting persisted in the job.xml for some reason. I'm not sure yet if this is a Pig 0.10 bug or if its just a compatibility breaking change.

rjurney commented 12 years ago

Help, I have this too.

rjurney commented 12 years ago

This seems to be a problem with using the Hadoop cache for configuration.

rjurney commented 12 years ago

Fixed by https://github.com/infochimps-labs/wonderdog/pull/8

ghost commented 12 years ago

This is still an issue in trunk, also across Pig versions (I've tried with Pig 0.8, 0.9, 0.10, and trunk). Issue seems to be that the Configuration object passed to ElasticSearchStorage.setStoreLocation (in elasticSearchSetup) is a copy of the one in JobControlCompiler.getJob. Thus nothing added to the Configuration object in setStoreLocation, including mapred.cache.files, makes it into the Job that Pig submits. And thus elasticsearch.yml is not localized, and the job fails with a:

ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate exception from backed error: org.elasticsearch.env.FailedToResolveConfigException: Failed to resolve config path [/home/evert/Downloads/elasticsearch/config/elasticsearch.yml], tried file path [/home/evert/Downloads/elasticsearch/config/elasticsearch.yml], path file [/disk1/mapred.local.dir/taskTracker/evert/jobcache/job_201208092125_0091/attempt_201208092125_0091_m_000000_3/work/config/home/evert/Downloads/elasticsearch/config/elasticsearch.yml], and classpath
        at org.elasticsearch.env.Environment.resolveConfig(Environment.java:205)
        at org.elasticsearch.node.internal.InternalSettingsPerparer.prepareSettings(InternalSettingsPerparer.java:62)
        at org.elasticsearch.node.internal.InternalNode.<init>(InternalNode.java:112)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
        at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:166)
        at com.infochimps.elasticsearch.ElasticSearchOutputFormat$ElasticSearchRecordWriter.start_embedded_client(ElasticSearchOutputFormat.java:258)
        at com.infochimps.elasticsearch.ElasticSearchO

Adding stuff to the Configuration might be wrong usage of the StoreFunchInterface, I don't know Pig well enough to know for sure. I've filed a Pig Jira: https://issues.apache.org/jira/browse/PIG-2872

I'm submitting to a Hadoop 0.20.205.0 cluster with Kerberos enabled. I've changed version a lot in trying to figure this out, but my latest setup was Pig's trunk (0.11), Elasticsearch trunk, and wonderdog trunk.