telefonicaid / fiware-cygnus

A connector in charge of persisting context data sources into other third-party databases and storage systems, creating a historical view of the context
https://fiware-cygnus.rtfd.io/
GNU Affero General Public License v3.0
65 stars 105 forks source link

Cygnus: selective scaping of delimiter character to avoid "Cosmos injection" #14

Closed fgalan closed 10 years ago

fgalan commented 10 years ago

Moved from https://github.com/telefonicaid/fiware-livedemoapp/issues/7:

ngsi2cosmos.py should parse the contextValue before writing it to Cosmos, escaping the delimiter (usually "|"). Otherwise, the user could "inject several columns in a single field" potentially breaking the schema defined by tools such as Hive.

This escaping should be a optional feature (typically, a flag in the CLI or configuration file), given that some cases it could be useful to have this injection to simplify NGSI model definition.

@frbattid is also having a look to this.

fgalan commented 10 years ago

Comment done in behalf of @frbattid , created during issue migration:

In more detail --> Currently, stored data is based on 6 fields separated by '|', being the 6th field the "value" of a measure. If this value is created by concatenating M strings with the '|' character separator then the final stored table will be based on 6 -1 + M fields.

If the Hive external table is created for this 6 -1 + M fields there is no problem.

The problem arises when, having a Hive external table defined for 6 fields, the value field is injected with those M extra fields. Queries on that 6-length table will crash when reaching the first 6 - 1 + M fields line.

frbattid commented 10 years ago

I think this issue has no sense anymore since the CSV-like serialization has been discarded at all.