simplesamlphp / SAML-tracer

Browser extension for examining SAML messages
https://addons.mozilla.org/nl/firefox/addon/saml-tracer/
BSD 2-Clause "Simplified" License
141 stars 39 forks source link

Only the first ampersand displays encoded #52

Closed dnmvisser closed 4 years ago

dnmvisser commented 5 years ago

I have somewhere in my SAML response a few AttributeValues that contain ampersand characters. From the SimpleSAMLphp logs on the IdP that generates the response I can see that these are all correctly encoded as & in the XML. SAML-tracer displays one entry correctly. But there is also an entry that contains two ampersands, and in this case only the first one displays as encoded, while the second shows as just the bare ampersand character. Github's syntax highlighting also shows it:

<saml:AttributeValue xsi:type="xs:string">GN4Phase2:Governance:GN4-2 Board</saml:AttributeValue>
<saml:AttributeValue xsi:type="xs:string">GN4Phase2:GN4-2 All Governance</saml:AttributeValue>
<saml:AttributeValue xsi:type="xs:string">GEANT Staff:CSO:All CSO</saml:AttributeValue>
<saml:AttributeValue xsi:type="xs:string">GEANT_CO:GEANT Staff:CSO:T&amp;I Ops:members_T&I Ops</saml:AttributeValue>
<saml:AttributeValue xsi:type="xs:string">GEANT Staff:CSO:T&amp;I Ops</saml:AttributeValue>
<saml:AttributeValue xsi:type="xs:string">GEANT Staff:All</saml:AttributeValue>

At this point I wasn't sure what exactly happens, so I went to create an attributevalue that contains 4 ampersands on one of our IdPs, and here it clearly shows that only the first one is encoded:

<saml:Attribute Name="urn:mace:dir:attribute-def:sn" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic">
<saml:AttributeValue xsi:type="xs:string">&amp;something & other & something else & also Newton</saml:AttributeValue>
</saml:Attribute>

I stumbled upon this when I copy/pasted the content and ran it through xmllint which started barking...

tvdijen commented 5 years ago

I think the issue is here.

replace() has no third parameter... It is simply ignored. The global-directive, that the current 'g' is representing should be a part of the first parameter... string.replace(/&/g, '&amp;') would fix this

tvdijen commented 5 years ago

This fix was actually already discussed here

khlr commented 5 years ago

This fix was actually already discussed here

Yep, seems to be this issue. But anyways, I'm still not sure if it's basically a good thing, that SAML-tracers encodes the content of a claim:

But I'm wondering if that makes sense... As you can see from the screenshot some posts above, the actual content of the claim is another un-encoded claim. And that's exactly what travels over the wire. So from my understanding the claim value should be encoded (since it's xsi:type="xs:string"), but that shouldn't be done by SAML Tracer - I think that should be done by the IdP before issuing the token. Or am I getting this wrong?

So regardless of whether the entire content is encoded or just the first character, I'm wondering if any encoding should happen in this place. Do you know what I mean? If an unencoded content was sent, why should SAML-tracer pretend that it was sent encoded although it wasn't…?

dnmvisser commented 5 years ago

Good point. I’d say tracer should display all content unencoded, i.e. as it appears on the wire. The deciding should be left to “real” user agents.

Dick

On Sat, 10 Nov 2018 at 19:42, Jan Köhler notifications@github.com wrote:

This fix was actually already discussed here https://github.com/Uninett/SAML-tracer/issues/26#issuecomment-350388158

Yep, seems to be this issue. But anyways, I'm still not sure if it's basically a good thing, that SAML-tracers encodes the content of a claim:

But I'm wondering if that makes sense... As you can see from the screenshot some posts above, the actual content of the claim is another un-encoded claim. And that's exactly what travels over the wire. So from my understanding the claim value should be encoded (since it's xsi:type="xs:string"), but that shouldn't be done by SAML Tracer - I think that should be done by the IdP before issuing the token. Or am I getting this wrong?

So regardless of whether the entire content is encoded or just the first character, I'm wondering if any encoding should happen in this place. Do you know what I mean? If an unencoded content was sent, why should SAML-tracer pretend that it was sent encoded although it wasn't…?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Uninett/SAML-tracer/issues/52#issuecomment-437608221, or mute the thread https://github.com/notifications/unsubscribe-auth/AD4tXEuVH-hWvgaZztIz0xRB6a0caHgGks5utx33gaJpZM4YUkdo .

-- Sent from a mobile device - please excuse the brevity, spelling and punctuation.

khlr commented 5 years ago

Hello there! I used the last days' foul weather to have another look on this issue.

A really nice way to reproduce this issue is using @AndersAbel's Sustainsys.Saml2.StubIdp. Anders provides StubIdp online at https://stubidp.sustainsys.com/. There, one can easily add e.g. Dick's sample string & something & other & something else & also Newton as an attribute value and POST a SAML request.

Ok, now the interesting part: The reason for me, not to edit the xmlEntities function was the following:

I'm wondering if any encoding should happen in this place. [...] If an unencoded content was sent, why should SAML-tracer pretend that it was sent encoded although it wasn't…?

But well, of course the IdP encodes the attribute values! One can take a look Firefox dev tools to confirm this:

image

It's just again the webRequest-API which decodes the values! So to show the values like they were originally sent, SAML-tracer in fact sould encode the values.

In the end this may be a bit inefficient (IdP encodes it, webRequest-API decodes it, SAML-tracer then encodes it again...), but... hey ;-)

AndersAbel commented 5 years ago

Thanks for the nice words about the stub idp :)

A note on encoding/decoding. When debugging SAML messages, I've found that incorrect encoding is a pain to find and handle. For signed messages, differences in encoding might invalidate the signature. So encoding back to what the string should have been is not the same as displaying the string as it was in case the Idp did an incorrect encoding.

khlr commented 5 years ago

Yep, that's definitely a point! That's the reason why I think that SAML-tracer should change as less as possible of the original HTTP-traffic. But since the webRequest-API delivers certain parts of the payload in a decoded fashion you won't come around encoding it again, if you want to display the data in a way it might have been sent... ;-)

jaimeperez commented 5 years ago

Thanks a lot for investigating into this @khlr!

While reading the discussion, I was wondering... why does the webRequest API decode HTML entities in some parts of the response? Or is it that the response will, coincidentally, only have those entities in the attribute values?

The thing is... I can totally see why certain characters would get HTML-encoded inside an attribute's value as sent by the IdP. But at the same time, the example of the NameID you were discussing in #26 is legit, meaning a NameID will be transferred in its raw form (XML, with no entity-escaping at all) as the content of an attribute. So both having an escaped entity in the value and raw XML are possible values.

I understand the problem here is that the API we are using is decoding all HTML entities it finds in the contents, and then we of course need to display that in an HTML context, meaning we need to escape all entities back. If that's the case, then I think #53 is the right fix and should be merged.

khlr commented 5 years ago

why does the webRequest API decode HTML entities in some parts of the response?

Good question. I assume it's to make the users' lifes easier. A lot of people might want to use the unencoded values. ¯\(ツ)

I can totally see why certain characters would get HTML-encoded inside an attribute's value as sent by the IdP. But at the same time, the example of the NameID you were discussing in #26 is legit, meaning a NameID will be transferred in its raw form (XML, with no entity-escaping at all) as the content of an attribute. So both having an escaped entity in the value and raw XML are possible values.

Sorry this was a misunderstanding of mine in #26. I had been fooled by the webRequest-API again... ;-) By now I'd say it's not ok to have anything other than a plain string in an attribute value. So everything inside of the value should be (and probably almost always is) encoded. If it wasn't encoded (if the value contained e.g. XML) that would break the XML schema of a SAML token. Having this in mind I think it's compelling to have only encoded string in the value.

I understand the problem here is that the API we are using is decoding all HTML entities it finds in the contents, and then we of course need to display that in an HTML context, meaning we need to escape all entities back.

Yep, the API causes the confusion ;-) And yes, we'll have to encode everything again. But the main reason for this is not displaying the payload in a HTML context. The real reason is because as a user of SAML-tracer you'd want to see the attribute values the way they have been sent. And by encoding them again they'll be displayed as they might have been sent. That's the sad thing about this: As Anders mentioned a missing or wrong encoding done by the IdP will be obfuscated by doing this. But we lose the information anyway because of the webRequest-API's decoding in the first place.

khlr commented 5 years ago

Oh boy, now I'm sort of confused…

Until just now I've been of the mind that there are only plain types like strings or integers and the like allowed as values of an AttributeValue. Although I think that these types are what one'll usually see in tokens there may be other more complex values out in the wild 😉

The spec clearly states, that AttributeValue can be of anyType which means (at least I think it does) that there are actually values allowed, that contain XML themselves:

<element name="AttributeValue" type="anyType" nillable="true"/>

See chapter 2.7.3.1.1 Element at line 1253.


So just to recap: Whereas I thought that this is invalid

<saml2:Attribute Name="SomeStrangeClaim">
    <saml2:AttributeValue><foo>bar</foo></saml2:AttributeValue>
</saml2:Attribute>

and has to be encoded like this

<saml2:Attribute Name="SomeStrangeClaim">
    <saml2:AttributeValue>&lt;foo&gt;bar&lt;/foo&gt;</saml2:AttributeValue>
</saml2:Attribute>

the former seems to be valid!

But if that's true… Why has SAML-tracer always had the function xmlEntities which encodes XML-characters? Just by mistake? 🤔

To sum it up: With the new knowledge I agree that SAML-tracer shouldn't encode anything. This would mean that the function xmlEntities shouldn't be used anymore. But on the other hand I'm afraid that the webRequest-API will put its hands on the values anyways…


Could anyone please confirm or refute my thoughts on this?

jaimeperez commented 5 years ago

Hi @khlr!

Yes, that was exactly what I was trying to convey in my previous message, though I think I failed miserably 😅

In any case, you are right, XML is perfectly valid inside an attribute's value. In fact, it's not uncommon at all. The eduPersonTargetedID attribute is defined as a SAML NameID, and as such, it's essentially XML. Even though this attribute is deprecated and not recommended to exchange nowadays, it's still very common in deployments around the world.

But if that's true… Why has SAML-tracer always had the function xmlEntities which encodes XML-characters? Just by mistake?

I can only speculate here, but I'd say probably this was the same confusion that lead SimpleSAMLphp to generate ePTID attributes as simple strings, instead of proper XML. Doesn't really matter anyway. The fact is that XML is valid there and we shouldn't be escaping it.

khlr commented 5 years ago

Thank you for the clarification, @jaimeperez!

Having this in mind the solution seems to simply drop the function xmlEntities since there's no need to encode anything. What do you think?