Closed nanounanue closed 9 years ago
Hi @nanounanue, this should be an easy fix. Kite used to expose a "repository" for datasets, which used URIs that started with "repo:". Then we added the dataset URI that incorporates that information, which is why your normal dataset URI contains pointers to how the dataset should be managed. In the shuffle, we added kite.dataset.uri to the Flume sink, but needed to keep kite.repo.uri and kite.dataset.name for backward-compatibility.
To fix your problem, you should switch to using kite.dataset.uri with your normal dataset URI. The error here, which I'll add a better message to, is that your repo URI starts with "dataset:" instead of "repo:". You can fix that as an alternative, but I suggest moving to setting the dataset URI and ignoring the repository stuff.
Thanks for using Kite!
I've fixed the bug, CDK-1003 in https://github.com/kite-sdk/kite/commit/07da28e2. Is it okay if I close this?
@rdblue thank you for quick answer ...
One more question, in my case, which is my "normal dataset URI"?
Because I modify that line to:
UFOAgent.sinks.UFOKiteDS.kite.dataset.uri = dataset:hive
or
UFOAgent.sinks.UFOKiteDS.kite.dataset.uri = dataset:hive:ufos
and I am getting the following error:
15/05/20 21:06:58 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Error trying to open a new writer for dataset dataset:hive
at org.apache.flume.sink.kite.DatasetSink.createWriter(DatasetSink.java:442)
at org.apache.flume.sink.kite.DatasetSink.process(DatasetSink.java:282)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException: Dataset name cannot be null
at org.kitesdk.shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:188)
at org.kitesdk.data.Datasets.load(Datasets.java:108)
at org.kitesdk.data.Datasets.load(Datasets.java:140)
at org.apache.flume.sink.kite.DatasetSink$1.run(DatasetSink.java:403)
at org.apache.flume.sink.kite.DatasetSink$1.run(DatasetSink.java:400)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:55)
at org.apache.flume.sink.kite.DatasetSink.createWriter(DatasetSink.java:399)
... 4 more
15/05/20 21:07:03 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Error trying to open a new writer for dataset dataset:hive
@nanounanue, you want something like the second: dataset:hive:ufos
. See the URI pattern docs.
@rdblue I did as you suggested and now, the stack trace error is slighty different:
15/05/20 21:57:53 WARN hive.MetaStoreUtil: Aborting use of local MetaStore. Allow local MetaStore by setting kite.hive.allow-local-metastore=true in HiveConf
15/05/20 21:57:53 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Error trying to open a new writer for dataset dataset:hive:ufos
at org.apache.flume.sink.kite.DatasetSink.createWriter(DatasetSink.java:442)
at org.apache.flume.sink.kite.DatasetSink.process(DatasetSink.java:282)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Missing Hive MetaStore connection URI
at org.kitesdk.data.spi.hive.MetaStoreUtil.<init>(MetaStoreUtil.java:78)
at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.getMetaStoreUtil(HiveAbstractMetadataProvider.java:63)
at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.resolveNamespace(HiveAbstractMetadataProvider.java:270)
at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.resolveNamespace(HiveAbstractMetadataProvider.java:255)
at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.load(HiveAbstractMetadataProvider.java:102)
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:192)
at org.kitesdk.data.Datasets.load(Datasets.java:108)
at org.kitesdk.data.Datasets.load(Datasets.java:140)
at org.apache.flume.sink.kite.DatasetSink$1.run(DatasetSink.java:403)
at org.apache.flume.sink.kite.DatasetSink$1.run(DatasetSink.java:400)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:55)
at org.apache.flume.sink.kite.DatasetSink.createWriter(DatasetSink.java:399)
... 4 more
15/05/20 21:57:58 WARN hive.MetaStoreUtil: Aborting use of local MetaStore. Allow local MetaStore by setting kite.hive.allow-local-metastore=true in HiveConf
...
Where I have to write that?
Kite looks for the metastore URI in two places:
dataset:hive://ms-host:port/dataset-name
The first option is preferred. We generally assume you're running with the environment configured to talk with your cluster.
I think that I have it configured in the correct way (otherwise the kite-dataset
examples wouldn't work), here is the fragment of my hive-site.xml
:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore</value>
</property>
...
<property>
<name>hive.metastore.uris</name>
<value>thrift://0.0.0.0:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
I am running in pseudodistributed mode, btw
If I use the second option that you gave me,:
UFOAgent.sinks.UFOKiteDS.kite.dataset.uri = dataset:hive://0.0.0.0:9083/ufos
everything works smoothly
But the question is, why isn't working the first one?
The first one depends on how you're configuring the program where you're using the API. You need to have the configuration files in the classpath for them to be picked up automatically when calling new Configuration()
.
@nanounanue, I don't know what I was thinking with my last response since you already said you're using Kite inside Flume. Oops. I don't think you should be required to set up the Flume classpath so it can see the Hive config. You should use the full dataset URI for now and hopefully the next version of CDH will fix this for you.
Thank you @rdblue !
No problem, let us know if you have any more issues. I'm going to close this, since I think you're able to move on.
Hi everyone
I want to adapt the
json
example provided, but I got this error:I ran the
flume-agent
with:The
spooldir_example.conf
isI created the dataset as follows:
Finally, the
morphline.conf
isWhat I am doing wrong?