tngan / samlify

Node.js library for SAML SSO
https://samlify.js.org
MIT License
609 stars 217 forks source link

fix: update validator to use samlify-validator-js #510

Closed ivarconr closed 3 months ago

ivarconr commented 1 year ago

@authenio/samlify-node-xmllint has a built in memory leak every time you run the SAML schema validator. It was related to a library it depends on, node-xmllint, where the source code does not exist anymore.

I have therefore create a new library called "samlify-validator-js", building on the existing work, and removed the memory leak.

tngan commented 1 year ago

@ivarconr would you like to further explain what you have fixed for the memory leak? the file you have modified is the compiled file, so it's also difficult for others to cross-check the code change.

thanks for your work!

ivarconr commented 1 year ago

Yes, I carefully copied of the node-xmlint npm package and included that as a package to the new package (github source code). This was needed because the source code for node-xmlint has been removed.

To identify the the memoery leak I took heap Snapshots from our application and identified that the leak was associated with event listeners being attached to the node process. It does not make any sense for that module to register itself as an event listener on the node process itself. It does this for every execution of the SAML validation.

To highlight the fixes performed I isolated that in its own commit. The change is not possible inspect in the GitHub UI, because of the nature of the compiled JavaScript file being to large.

I simply removed two lines:

  1. process["on"]("uncaughtException",(function(ex){if(!(ex instanceof ExitStatus)){throw ex}}))}
  2. process["stdout"]["once"]("drain",(function(){process["exit"](status)}));console.log(" ");

Also highlighted in the IDE diff: 1: image

2: image

This was the easiest way to fix the problem, and allowed us to not have a memory leak in our application. It is not an option for us to rely on any third party tools in Java or C++, as it increases the complexity of self-hosting Unleash.

Future improvements: In the future we will look in to generating our own port of xmllint to WASM, and build on that to do XML schema validation in pure javascript. This would allows us to heavily reduce the size of this module (down to KBs instead of MBs). I could not find a suitable project already solving this in a a perfect way, the closest module is https://github.com/noppa/xmllint-wasm. I got xmllint-wasm working as a validator, but the fact that it uses workers behind the scene scares me and my simple tests ended up consuming a lot of CPU.

fyi; We are already using this in Unleash Cloud (https://www.getunleash.io/).

tngan commented 1 year ago

@ivarconr Thank you for your work, I have created @authenio/samlify-xmllint-wasm as the validator using xmllint-wasm. It might be worth to take a look on the CPU usage. I think it's helpful for someone who doesn't want to have java dependencies.

Since the source code of xmllint was removed, I will stop the maintenance of the validator, xmllint.js is a compiled script which I was not aware of before.

ivarconr commented 1 year ago

I have created @authenio/samlify-xmllint-wasm as the validator using xmllint-wasm.

This is definitely the way to go. Removing hard dependencies on Java is a good thing for this project. CPU is usually less of a concern (given its not crazy and you have a decent amount of sign-in requests per second).

ivarconr commented 1 year ago

I looked in to you code. My biggest concern is the dependency on xmllint-wasm which heavily utilized worker threads on Node.js. I would rather not have that introduced in to our application, coming from a library.

from: https://github.com/noppa/xmllint-wasm#installation

The library uses Node.js Worker threads to isolate the Emscripten wrapper from your main process (so that when it calls process.exit your whole server won't go down), which is why Node >= 12 is required.

It might be ok, but requires better understanding of potential side-effects. Any thoughts @tngan?