Forbidden Characters in Entity

telefonicaid / fiware-orion

Context Broker and CEF building block for context data management, providing NGSI interfaces.

https://github.com/telefonicaid/fiware-orion/blob/master/doc/manuals/orion-api.md

GNU Affero General Public License v3.0

211 stars 265 forks source link

Forbidden Characters in Entity #3559

Open ptrdtznr opened 5 years ago

ptrdtznr commented 5 years ago

Hello,

according to the ForbiddenChar, the forbiddne character are:

<
>
"
'
=
;
(
)

But why is the "/" character than also forbidden?!

If I want to create an entity like that

{
    "id": "hello/world",
    "type": "slashedEntity",
    "message": {
        "type": "Text",
        "value": "It is not working"
    }
}

A response generates {"error":"BadRequest","description":"Invalid characters in entity id"}, which I do understand but it is not coherent to the documentation.

However, I would like, due to ROS standards, allow the "/" within the Entity-ID name. Can you make it happen?! ;-)

kzangeli commented 5 years ago

Looking at the source code, the Entity ID field for NGSIv2, has four additional forbidden characters:

Also, all characters with ASCII code 0-32 and 127-255 are forbidden in Entity ID. Perhaps this is not well documented (or even not documented at all).

That said, I'm not sure why we don't allow slashes in entity id.

fgalan commented 5 years ago

From NGSIv2 specification http://fiware.github.io/specifications/ngsiv2/stable, in the "Field syntax restrictions" section:

Allowed characters [in id fiels, such as entity id] are the ones in the plain ASCII set, except the following ones: control characters, whitespace, &, ?, / and #.

This is to avoid ambiguities in URL processing. If we would allow / for instance we could have entities with id foo and entities with id foo/bar. Thus, if we have the following request:

GET /v2/entities/foo/attrs

It may refer to:

A get entity operation (i.e. GET /v1/entities/{entityid}) on foo/attrs
A get entity attributes operation (i.e. GET /v1/entities/{entityid}/attrs) onfoo`

so there is an unsolvable ambiguity for the NGSIv2 server.

fgalan commented 5 years ago

However, I would like, due to ROS standards, allow the "/" within the Entity-ID name. Can you make it happen?! ;-)

The usual solution is to encode-decode at client side. As the reference you cite suggests:

If your application needs to use these characters, you should encode it using a scheme not including forbidden characters before sending the request to Orion.

Luxxii commented 5 years ago

Would it be possible that the ContextBroker internally encodes unwanted/forbidden characters?

It would be nice to send:

{
    "id": "hello/world",
    "type": "slashedEntity",
    "someTextualCondition": {
        "type": "ParsableCondition",
        "value": "If (x < y): then -> text = \"x is smaller!\" "
    }
}

instead of sending the encoded version:

{
    "id": "hello%2Fworld",
    "type": "slashedEntity",
    "someTextualCondition": {
        "type": "ParsableCondition",
        "value": "If %28x %3C y%29: then -%3E text %3D  %5C%22x is smaller!%5C%22 "
    }
}

which is barely readable and also displayed like this at v2/entities.

The above is still valid JSON. The URL for this entity could then be accessed by v2/entities/hello%2Fworld/attrs.

Instead of letting the client escape unwanted characters and keeping the Escaping in sync with the ContextBroker, the server could do it. In addition to that, the server would have full control, which characters it wants to escape. Only the Client then needs to know which one are escaped in case he wants to access it.

Greetings! :)

fgalan commented 5 years ago

@luxxi to be sure I fully understand your proposal, your idea is to automatically encode in the Orion request but not to decode in the responses? I mean, the client will create an entity as follows:

POST /v2/entities
{
    "id": "hello/world",
    "type": "slashedEntity",
    "someTextualCondition": {
        "type": "ParsableCondition",
        "value": "If (x < y): then -> text = \"x is smaller!\" "
    }
}

So latter he will retrieve its data using:

GET v2/entities/hello%2Fworld/attrs

getting the following response:

{
    "someTextualCondition": {
        "type": "ParsableCondition",
        "value": "If %28x %3C y%29: then -%3E text %3D  %5C%22x is smaller!%5C%22 "
    }
}

Is my understanding correct?

Luxxii commented 5 years ago

I am talking about an INTERNAL encoding. This also includes the decoding part. So for clearance:

A client wants to create an entity and does the following:

# POST /v2/entities
{
    "id": "hello/world",
    "type": "slashedEntity",
    "someTextualCondition": {
        "type": "ParsableCondition",
        "value": "If (x < y): then -> text = \"x is smaller!\" "
    }
}

The Orion Context-Broker (OCB) receives this query and encodes for specific characters as long as the sent JSON is valid (and a FIWARE-Entity).

If a user accesses via a Web-Browser: v2/enitites, the OCB needs to decode the encoded entity back and would display the following:

[
  {
    "id": "hello/world",
    "type": "slashedEntity",
    "someTextualCondition": {
        "type": "ParsableCondition",
        "value": "If (x < y): then -> text = \"x is smaller!\" "
    }
  }
]

The user would not notice that the entity was encoded inside the OCB.

Iff a client needs the attributes of the entity hello/world, it may try to receive it via: v2/entities/hello/world/attrs (which obviously looks strange due to the / in the id-attribute).

An error would be returned by the OCB (BadRequest or similar.).

For this the documentation needs to contain Information about special characters that are internally encoded. This should also mention the retrieval of such entities, which use those characters.

So in all: The client only needs to encode hello/world to hello%2Fworld and can retrieve the data with v2/entities/hello%2Fworld/attrs: which returns:

{
    "someTextualCondition": {
        "type": "ParsableCondition",
        "value": "If (x < y): then -> text = \"x is smaller!\" "
    }
}

Such an internal encoding/decoding would allow Entities which use the current forbidden characters. They do not need to be present inside the id-attribute. Also i found this Issue: #3550 , which also would be solved with this approach. The OCB would also be possible to accept the following:

  {
    "id": "SomeID",
    "type": "slashedAttribute",
    "hereis/thebad/character": {
        "type": "string",
        "value": "some text "
    }
  }

I hope my proposal is now more clear!

fgalan commented 5 years ago

Thanks for the clarification!

The bottom line is that rendering some chars could lead to injection attacks in clients that consume results from Orion directly and are not well designed or well protected. For instance, a web application acting as Orion client.

For instance something like this:

{
    "attributeForAttch": {
        "type": "attack",
        "value": "javascript:alert('Executed!');"
    }
}

If the client doesn't have care processing the value it can end executing arbitrary JavaScript code (in the example, a harmless javascript:alert('Executed!'); pop up, but it could be something more dangerous).

Given that Orion doesn't allow to render ', ;, ( and ) the risk is highly minimized.

I hope this explanation helps to understand the rationalize behind avoid the rending of some chars in Orion. Of course, you can always say "that's a problem of the client" :) which is somehow right but not the principle we are currently following in Orion.

ptrdtznr commented 5 years ago

I do understand this thought @fgalan , but there are many ideas, exploits and hacks to do some Code-Injeqtion (SQL Injection, Javascript, etc. pp). You won't be able to minimize the possibilities because you are not allowing few characters. I would prefer either allow as much as possible or basically nothing except [A..Za..z]. I am stressing this example a bit, but I am not happy with the set of characters and the decision/reason behind of it.

fgalan commented 5 years ago

Orion forbids (, ', " and ;, thus killing the syntaxis of most of the languages in which the injected exploit could be encoded.

But maybe we have left something without covering... it would be great to know in that case an example of exploit that without using none of the current forbidden chars in Orion could cause code execution in an injection attack scenario.

Luxxii commented 5 years ago

By forbidding specific characters, the possibility by not being attacked is not reduced in my opinion.

E.G.: Here is another URL-Encoding-Reference. There, it specifically shows that you can encode control characters. The OCB would allow inputs like:

# POST /v2/entities
{
    "id": "attack",
    "type": "encodedAttack",
    "someAttr": {
        "type": "SomeWhatOfNullSlopeWithOtherControlChars",
        "value": "%00%00%00%00%00%00%00%0D%0A%1B"
    }
}

So what if an attacker abuses this? E.G. he could know that some libraries online are vulnerable by simply using control characters. URL-Encoding would not stop the attacker to compromise the client.

In your example:

{
    "attributeForAttch": {
        "type": "attack",
        "value": "javascript:alert('Executed!');"
    }
}

The client somewhat executes here the value-Attribute. What would prevent the client to execute: javascript%3Aalert%28%27Executed%21%27%29%3B? The only extra step by the client needed is the decoding of the value-Attribute.

For me this problem only gets shifted and does not provide any extra security-measures. A Data-Only-Attack would look different if URL-Encoding is used, but would not be impossible. Another link to SOF, where this topic was also discussed..

@fgalan This strives away from the actual topic. @ptrdtznr asked if we could use those forbidden characters in entities. It would be nice if you could give us a clear answer whether this is doable (in near future).

ptrdtznr commented 5 years ago

I agree, we are loosing the focus and drifting away

fgalan commented 5 years ago

I think the discussion is useful, as the rationale behind the forbidden chars is related to what we are talking about. Not too much drifted away, from my point of view ;)

I know this is a controversial topic. Some people thinks one way, some other people thinks other way. In fact, the link you cite https://stackoverflow.com/questions/4999884/is-urlencode-good-enough-to-stop-all-sql-injection-attacks-in-the-year-2011 also shows this controversy. And in some way sentences like "it does turn your apostrophes ' into %27, rendering them harmless" supports the current Orion design point.

But, back to the topic... ok

@ptrdtznr asked if we could use those forbidden characters in entities

Short answer: no, you cannot with the current Orion implementation.

However, Orion could be modified to make this more flexible, so let the user to decide how he wants to use Orion and which risks he want to assume. Currently we have two kind of syntax restrictions in Orion:

The general one which applies to any field and any value in the API (https://fiware-orion.readthedocs.io/en/master/user/forbidden_characters/index.html)
The one which applies only to NGSIv2 id-kind fields (http://fiware.github.io/specifications/ngsiv2/stable, in the "Field syntax restrictions" )

Rule 2 should not be changed, due to the following reasons (the first one is actually the really important one):

It ensures the "sanity" of ids. Note that NGSIv2 ids are not used only by Orion, other components in the FIWARE architecture are using them in places where weird characters could be problematic. For instance, the MySQL sink in Cygnus builds DB table names based in entities id and MySQL doesn't accept weird characters in table names.
Changing it would imply modify the NGSIv2 specification

Rule 1 (which actually is the more restricting one) could be relaxed if Orion utilization case doesn't need to enforce protection to injection attacks. This could be configurable, so Orion would include as configuration parameter the list of forbidden chars. If you set that list to the empty set, then you would be removing completely rule 1.

This would, of course, involve modifications to Orion code. I'm not interested in such use case (at least at the present moment), so I'm not going to do that implementation. However, if somebody wants to go for it, I'll be more than happy to review the PR with the contribution and provide feedback about it (have a look to the contribution guidelines in that case)

Thanks!

jmcanterafonseca commented 5 years ago

sorry for jumping in here. the #, / and ? should be valid chars for Entity Ids. We already had this discussion with NGSI-LD.

There is no risk of ambiguity as the API client should encode the corresponding URI component when calling the REST API. For instance to retrieve an Entity with id 'a/b' a URI of the form /v2/entities/a%2Fb should be used.

ptrdtznr commented 5 years ago

Thanks @jmcanterafonseca , I agree here totally with you.

jaimeventura commented 4 years ago

My 2 cents on this discussion.

I think Code-Injection (SQL Injection, Javascript, etc. pp) shouldn't something for the CB to worry about. The client/consumer application should worry about that. In fact, if working with subscriptions, a "data consumer" application should have some sanitisation mechanism for the received data (imagine for instance, some attacker pretending to the CB and injecting a notification on the applications' endpoint).

At most, as it was mentioned by @fgalan "Orion could be modified to make this more flexible, so let the user to decide how he wants to use Orion and which risks he want to assume." On the other hand, wouldn't this somehow break standards? A "data producer app" would have to "know" if it could write the forbidden chars or not to a CB instance and it would have to act differently when working with different CBs.

Also, encoding data by the "data producer application" would force the "data consumer application" to know about it or be prepared for it. Again, wouldn't this somehow break standards?

I haven't tested this, but i believe encoding data by the "data producer application" could break who queries are now handled (searches not getting expected results).

I agree on having forbidden characters in Entity ID, for clarity. But, at least for NGSI-LD, it is proposed(?) that Entity IDs should be an URN. And, according to my understanding of rfc 8141, chars like question marks are allowed in URNs.

fgalan commented 4 years ago

@ptrdtznr thanks for the feedback! Please find some inline comments

At most, as it was mentioned by @fgalan "Orion could be modified to make this more flexible, so let the user to decide how he wants to use Orion and which risks he want to assume." On the other hand, wouldn't this somehow break standards? A "data producer app" would have to "know" if it could write the forbidden chars or not to a CB instance and it would have to act differently when working with different CBs.

Yes, this can be a problem. However, under my understanding, that configuration parameter would be used as a "last resort measure" in cases in which the breaking of standard is less pain than not enabling it. Anyway, being a configuration parameter, it would be completely tp the user (i.e. the people running Orion) either enabling ir ot not.

Also, encoding data by the "data producer application" would force the "data consumer application" to know about it or be prepared for it. Again, wouldn't this somehow break standards?

In this case I don't see the breaking of standards.

It is assumed some degree of awareness than the consumer application has on the producer application or the other way around. For instance, in the domain of streetlights management, the consumer application needs to know the kind of entities to use (StreetLight and StreetLightCabinet) and the attributes that they have (location, voltage, intensity, powerFactor, etc.). This is part of the datamodel which as as a "contract" between both parties (in this case, for instance: https://fiware-datamodels.readthedocs.io/en/latest/StreetLighting/Streetlight/doc/spec/index.html) and, as part of this datamodel, the encoding used for fields could be specified. In other words, the encoding would be part of the standard.

I agree on having forbidden characters in Entity ID, for clarity. But, at least for NGSI-LD, it is proposed(?) that Entity IDs should be an URN. And, according to my understanding of rfc 8141, chars like question marks are allowed in URNs.

Note it is not a matter of clarity from the point of view of the human :), but also from the point of view of the consumer of this information. For instance, Cygnus persistence will most probably suffer with chars like "/" or "?" in entity id or types, as entity id/type can be used to build table names in some sinks which don't support special characters in table names (like MySQL or Postgresql). Of course, you can always escape at table name level, but a table name full of % (or any other funny escaping character) is a pain in the ass for any DBA or in general any people using these tables :)

Has the complete "context e2e" (I mean, not only the Context Broker isolately, but the FIWARE GEs consuming the context information notified by Orion, which includes Cygnus, Perseo, STH, QuantumLeap, etc.) been taken into account in the design of NGSI-LD? For instance, has the impact of "/" and "?" in entity id and entity types could have in Cygnus in the SQL-based and CKAN sinks been taken into account, in the line I mention in the paragraph below? I'd really want to know...