jerrcs / simplesamlphp

Automatically exported from code.google.com/p/simplesamlphp
Other
0 stars 0 forks source link

Metadata converter Error 500 on w3.org #597

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
When converting metadata with reference to w3.org, the SSP-server tries to 
download the schema but fails with an error 500 (Server error).

DOMDocument::loadXML seems not to send a user-agent header which causes w3.org 
generate a error 500. Attached fix sets the user-agent header.

Original issue reported on code.google.com by Patrick....@gmail.com on 25 Nov 2013 at 10:08

GoogleCodeExporter commented 8 years ago
Hi,

I'm assuming that you have the 'debug.validatexml' option enabled?

    /**
     * This option allows you to enable validation of XML data against its
     * schemas. A warning will be written to the log if validation fails.
     */
    'debug.validatexml' => FALSE,

We cannot control the schema URL that the DOM library uses for schema 
validation, so this is not something we can actually fix. I guess we should 
most likely just remove the option, as there are other ways to check XML 
messages when necessary.

Original comment by olavmrk@gmail.com on 26 Nov 2013 at 11:54

GoogleCodeExporter commented 8 years ago
It will not work for all cases, but why not fix the quite common case with w3c 
with the suggested patch? If it would actually be attached of course ;)

Original comment by thijs@kinkhorst.com on 26 Nov 2013 at 2:48

GoogleCodeExporter commented 8 years ago
I agree with Thijs, w3c is quite common and the patch will fix the issue. It 
will make SSP a better http-client. As long as the debug.validatexml option 
actually exists, why not make it at least a little better...

The issue did come up the moment a future SP asked me to import their metadata 
into my IdP to validate it (it didn't validate at another IdP)... So the 
validatexml option was quite usefull in this case... The XML was not valid, but 
the medatata converter did create valid config for SSP (debug.validatexml set 
to FALSE).
If debug.validatexml did not exist, the suggestion would be the XML is correct, 
which it isn't.

Before I solved the problem in SSP I looked for another way to validate the 
metadata. However, I could not find a ready-made solution.

I did attach the wrong file... Correct patch is attached now.

Original comment by Patrick....@gmail.com on 27 Nov 2013 at 8:58

Attachments:

GoogleCodeExporter commented 8 years ago
The reason it fails with the default user agent is that w3.org is getting 
hammered by schema validators fetching schema's from them without caching them 
locally. Now, doing this, we would be working around the protection they have 
added on their infrastructure.

(Other measures they have taken is to make the schemas load so slowly that 
people barely can download them, thus give people an incentive to not load them 
from w3.org...)

Btw.: I have a directory with a lot of schemas common in metadata, rewritten to 
not fetch them from remote sources. I guess I should clean it up a bit and 
publish it somewhere :)

Original comment by olavmrk@gmail.com on 27 Nov 2013 at 10:07

GoogleCodeExporter commented 8 years ago
The problem isn't w3.org filtering the user-agent... DOMDocument::loadXML 
doesn't send any user-agent:

GET /TR/2002/REC-xmldsig-core-20020212/xmldsig-core-schema.xsd HTTP/1.0
Host: www.w3.org

HTTP/1.0 500 Server Error
Cache-Control: no-cache
Connection: close
Content-Type: text/html

<html><body><h1>500 Server Error</h1>
An internal server error occured.
</body></html>

And yes after applying the patch the validation is very slow...

Isn't is possible (and relative easy) to first pre-process the metadata XML and 
find-replace all www.w3.org references with references to the files in SSP's 
schema directory. All common xsd's are already shipped with SSP.

Original comment by Patrick....@gmail.com on 27 Nov 2013 at 1:04

GoogleCodeExporter commented 8 years ago
Yes, I guess I could have worded it differently :) They are filtering requests 
lacking a user-agent, as well as some specific user-agents.

For more information about w3.org's problems:

    http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/

(That post is about DTDs, but the same applies to XSDs.)

Thus, I do not want to take any action in order to work around protection they 
have added.

Now, regarding fixing the schemas in the schema directory, that is certainly 
doable. That would mean adding more schemas to the directory if we want to 
validate all parts of the metadata and messages. E.g. the xml signature 
specification, encryption, various parts of xml schema schemas, soap, ...

I do not really want to turn SimpleSAMLphp into an XML schema collection, so 
I'm not certain that it is worth it? Is it really that useful?

Original comment by olavmrk@gmail.com on 27 Nov 2013 at 1:38

GoogleCodeExporter commented 8 years ago

Original comment by jaim...@gmail.com on 26 Feb 2014 at 2:27

GoogleCodeExporter commented 8 years ago
Closing this issue as WontFix. We could provide schema validation as a separate 
package, if desired.

Original comment by olavmrk@gmail.com on 27 Feb 2014 at 10:43