rheem-ecosystem / rheem

Rheem - a cross-platform data processing system
https://rheem-ecosystem.github.io
5 stars 0 forks source link

Generate new Element for Data #54

Open berttty opened 7 years ago

berttty commented 7 years ago

Currently the types of the data us permit to represent almost everything but the structure is static, which does not allow is discover the structure of the data, for this is necessary to create a new type of the Example for use this type of data is #49 JSONOperator.

sekruse commented 7 years ago

Not sure if I understand the issue, so let me clarify it.

Let's say you have some Rheem plan, but you don't know the datatypes in the plan in advance. In other words, you will only know the datatypes when the plan is being executed. And you need some support in Rheem to deal with this circumstance. Did I understand this correctly?

If so, I would have the following suggestions:

  1. You can use a dynamic datatype. For instance, if you want to work with JSON data, you could create a JSON data quantum type, which can store a dynamic set of properties.
  2. In my experience, it is usually the case that you have to know at least a bit about your data at design time. That is because if you don't know your data at all, then you usually cannot do anything with it, because your code needs to query some fields etc. And even if you don't know your data completely, there is at least some knowledge of, e.g., some fields of the data quanta, for which you could create a static type. Otherwise, it would be interesting to hear about use cases where you know nothing about the data you are working with.