telefonicaid / fiware-cygnus

A connector in charge of persisting context data sources into other third-party databases and storage systems, creating a historical view of the context
https://fiware-cygnus.rtfd.io/
GNU Affero General Public License v3.0
65 stars 105 forks source link

Cygnus: MySQL database connector #42

Closed fgalan closed 10 years ago

fgalan commented 10 years ago

The objetive is to create a new sink to persit context elements (sent to Cygnus by Orion using notifyContextRequest) in a relational database.

Regarding the data model to use in the relational database, three alternatives are on the table:

  1. One "big table". All the entities and attributes are in the same table. The columns would be: timestamp, human readable timestamp, entity id, entity type, attr name, atrr type, attr value, attr md (serialized in JSON).
    • Pros: simplifies table creation (static), simplifies aggregated queries.
    • Cons: big table scalability?
  2. Per-entity table. The name of the table is the name of the entity (actually, the concatenation of id and type, as we are doing now in other parts of Cygnus). The columns of the table are as in solution 1 except from entity id and entity type.
    • Pros: better scalability than in alternative 1
    • Cons: dynamic table creation (Cygnus has to verify that the table exist before inserting a new row).
  3. Per context element (i.e. entity + attribute) table. The name of the table is the name of the context element (actually, the concatenation of entity id, entity type, attribute name and attribute type, as we are doing now in other parts of Cygnus). The columns of the table are as in solution 2 except from attr name and attr type.
    • Pros: better scalability than in alternative 2 (really?)
    • Cons: dynamic table creation (Cygnus has to verify that the table exist before inserting a new row), too "scattered model"?

In addition, as some SQL stamement may vary depending on the particular database technology, we should decide ASAP in which one we focus for the first version of the sink (e.g. MySQL, etc.).

frbattid commented 10 years ago

My intuition says one big table may show scalability problems, while a per-context element table may lead us to hundreds (or more) of tables. Thus, due to "virtue is in the mean" :), I would say the better approach is a per-entity table. In addition, this matches what we are discussing regarding the same problem in HDFS (https://github.com/telefonicaid/fiware-connectors/issues/15), where a per-entity file approach seems to be the better (with no option to be changed by the user).

fgalan commented 10 years ago

At the end, the DB technology will be MySQL (issue name edited to reflect that).

frbattid commented 10 years ago

Apart from creating a table for each entity, I'm wondering if a database must be created for each user. This database would provide him a private space.

EDIT: in addition to the database for the user, which provides him a private space, we can consider the possibility to "tag" the tables with certain string in order to create datasets. This would mimic how the context data is being persisted in HDFS. Example:

HDFS (the folder "/user/myuser" specifies the private space, and the folder "dataset_*" specifies the dataset): hdfs://host/user/myuser/mydataset_1/entity_1.txt hdfs://host/user/myuser/mydataset_1/entity_2.txt ... hdfs://host/user/myuser/mydataset_2/entity_3.txt hdfs://host/user/myuser/mydataset_2/entity_4.txt

MySQL (the database "mysuer" specifies the private space, and the tag "dataset_*" specifies the dataset): mysql://host/myuser/mydataset_1_entity_1.sql mysql://host/myuser/mydataset_1_entity_2.sql ... mysql://host/myuser/mydataset_2_entity_3.sql mysql://host/myuser/mydataset_2_entity_4.sql

fgalan commented 10 years ago

Database per use could make sense. Orion multiservice/multenancy behaves that way (see https://forge.fi-ware.org/plugins/mediawiki/wiki/fiware/index.php/Publish/Subscribe_Broker_-_Orion_Context_Broker_-_Installation_and_Administration_Guide#Multiservice.2Fmultitenant_database_separation). What is the limit of databases allowed in a single instance of MySQL?

Regarding the tags... not sure of understanding. What do you mean by host/myuser/mydataset_2_entity_4.sql? Is that an actual file used to create the table? Or the table is created by the Orion2MySQL connector at first-measure time?

frbattid commented 10 years ago

Well, regarding the number of databases that can be created, it seems to be limited by the file system, not by MySQL. The problem is the creation of such amount of databases is discouraged at all, see: http://stackoverflow.com/questions/4864519/max-limit-of-databases-that-can-be-created-in-mysql

Even, the same criteria avoids the creation of a table for each entity.

Thus, it seems using a relational storage impilies to use a relational design like the following one:

fgalan commented 10 years ago

Makes sense... or at least I cannot offer arguments that support other alternatives (however, I'm not an expert in relational DB modeling, so probably other people have sounder opinions on this :)

frbattid commented 10 years ago

As finally discussed with Jose Miguel, there will be a database per user, in order to grant a "private" environment where no other user can take a look to the data. Within per-user databases, there will be a table per entity, persisting the data in columns as usual: ts, iso8601, entityId, entityType, attributeName, attributeType and attributeValue.

Of course, current version of Cygnus is mono-tenant; multi-tenancy capabilities have been moved to release/0.3

frbattid commented 10 years ago

Implemented in PR https://github.com/telefonicaid/fiware-connectors/pull/47

fgalan commented 10 years ago

Work done in PR #47, except some part deffered in issue #50