telefonicaid / fiware-cygnus

A connector in charge of persisting context data sources into other third-party databases and storage systems, creating a historical view of the context
https://fiware-cygnus.rtfd.io/
GNU Affero General Public License v3.0
65 stars 105 forks source link

Cygnus: ability to set the per file consolidation level #15

Closed fgalan closed 10 years ago

fgalan commented 10 years ago

Migrated from https://github.com/telefonicaid/fiware-livedemoapp/issues/5:

Currently, each attribute goes to a different file always. However, a more flexible approach will be to use a selector (in the process configuration) to chose between different consolidation levels:

The naming of the file would be adjusted consequently.

@frbattid aditions, a brief explanation on the reasons that leaded to this:

Not all the attributes regarding an event may be updated at the same time. Therefore, if we record a line per each event, and that line must contain values for each attribute, then the result is a file containing thousands of lines, having each line several null values. Thus, we decided to store only pairs in order to “save” disk space (using big data is not a reason for being inefficient). Then we decided to split each pair in a separated file from the “human being” perspective, because having a single file will show a mesh of attributes, but having a file per attribute will show in a clear way the evolution of such attribute. The last point was probably unnecessary and could be avoided as proposed by Fermín.

frbattid commented 10 years ago

Does this issue continue having sense? From version 0.2 of Cygnus all the data is serialized in a per entity fashion by default, which is aligned with the last comment in the previous message; and it seems to be best option.

In addition, I foresee problems if giving the user the "power" of switching: if such a user firstly decides to persist in a per entity fashion, and after certain time he/she decides to persist in a per (entity,attribute) fashion then two different models will be coexisting within the same dataset...

fgalan commented 10 years ago

The same question aires in #42. Thus, if we definitively chose per-entity "container" (where container is a file, a relational database table or datastore at CKAN), should we keep the same decission in all ours sinks?

frbattid commented 10 years ago

I would say yes... What I would like to clarify/understand is the relational database table part, but I think it is better to do that in https://github.com/telefonicaid/fiware-connectors/issues/42