rheem-ecosystem / rheem

Rheem - a cross-platform data processing system
https://rheem-ecosystem.github.io
5 stars 0 forks source link

JSONSource operator #49

Open berttty opened 7 years ago

berttty commented 7 years ago

Is necessary the creation of the JSONSource operator becouse several API of the communication is in format JSON, example twitter and facebook.

sekruse commented 7 years ago

Would a JSONSource read from a filesystem, a database, a network socket, ...? So far, sources are tied to a physical data locations rather than logical formats. Would a parser UDF and a JSON datatype also be a solution to your problem?

berttty commented 7 years ago

the idea is building a wrapper under TextFileSource in first time, but this wrapper read of the file and output is a record for that of this mode the next operator will work with record. The parser JSON search the key for builded of the record and not necessary analyze of the complete string.

sekruse commented 7 years ago

And wouldn't something like

javaPlanBuilder
    .readTextFile("hdfs://...")
    .map(line -> parseJson(line))

do the job? You could combine this with any type of source, then.