libxmljs / libxmljs

NodeJS bindings for libxml2 written in Typescript
https://libxmljs.github.io/libxmljs/
MIT License
1.05k stars 255 forks source link

don't validate schema on each validate call #276

Open rkoberg opened 9 years ago

rkoberg commented 9 years ago

I am trying to validate over 56,000 mathml snippets. From the code, each call to validate tries to validate the very large mathml2 xml schema.

Could there be a new function like parseXsdString that validates the schema once, and then calls to validate rely on that being done. Or, for backwards compatibility:

xmlDoc.validate(xsdDoc, true);

where the second arg is telling validate that the xsd has already been validated.

I am asking because, for some reason, after my script made it through several thousand validations, it stopped with the error and I am hoping the above could fix(??):

/my/project/conversion/psoc/node_modules/libxmljs/lib/document.js:73
    return this._validate(xsd);
                ^
Error: Invalid XSD schema
    at Document.validate (/my/project/conversion/psoc/node_modules/libxmljs/lib/document.js:73:17)
    at /my/project/conversion/psoc/mathml-validate-report.js:56:20
    at evalmachine.<anonymous>:271:14
    at /my/project/conversion/psoc/node_modules/graceful-fs/graceful-fs.js:102:5
    at Object.oncomplete (evalmachine.<anonymous>:107:15)

Not sure why it would so many other validations. However, there should be no need to validate the schema on each XML validation.

polotek commented 9 years ago

We would accept a patch for this. I'm not sure when once of us will get to it. I would suggest the following things.

I'm not sure if the boolean arg should default to false so that schema documents aren't validated by default. This would be a significant compatibility change and I'm not sure how it will affect existing programs.

It would be nice to do this same work for the RelaxNG versions of these functions. But maybe that'll be a separate pull request.

rkoberg commented 9 years ago

Yea, I could do java, but I don't have have knowledge of the C. I can confirm that the current state of the code dies when doing 10s of thousands of validations.

polotek commented 9 years ago

Learning new things is important. :)

You should at least be able to add the boolean argument that will get you past your current problem. I'm just saying we may not get to a complete fix for this soon.

On Sun Jan 18 2015 at 7:15:59 PM Rob Koberg notifications@github.com wrote:

Yea, I could do java, but I don't have have knowledge of the C. I can confirm that the current state of the code dies when doing 10s of thousands of validations.

— Reply to this email directly or view it on GitHub https://github.com/polotek/libxmljs/issues/276#issuecomment-70433720.