vesoft-inc / nebula-spark-utils

Spark related libraries and tools
23 stars 31 forks source link

import data inside kafka value to nebula #157

Open sworduo opened 2 years ago

sworduo commented 2 years ago

This is the resolution for issue https://github.com/vesoft-inc/nebula-spark-utils/issues/130 (import data from Kafka to Nebula). In this update, it is supported by nebula-exchange to parse data from the value field of Kafka and import which to Nebula. It's worth noting that other fields included in Kafka like offset, key,etc are abandoned. Meanwhile, since Kafka is streaming data, it's impossible to switch data source once Kafka is chosen, which means the tag/edge defined in configuration can only be parsed from Kafka. Hence, the Kafka config is defined independently instead of indicated inside the tag/edge config. In this case, all tag/edge share the same Kafka config. More details can be found in the accompanying README-CN.md.

CLAassistant commented 2 years ago

CLA assistant check
All committers have signed the CLA.

wey-gu commented 2 years ago

Thank you so much @sworduo, this PR makes real-world Kafka streaming source Usability to the next level.

@Nicole00 🎉

Nicole00 commented 2 years ago

Thanks for your pr to support the parsing for kafka‘s value. This pr changes the architecture of Exchange showed in doc https://docs.nebula-graph.com.cn/2.5.1/nebula-exchange/about-exchange/ex-ug-what-is-exchange/, can we just modify the StreamingReader to parse the kafka's value to DataFrame?