Closed ronilan closed 2 years ago
Thanks for the PR writeup @ronilan. I haven't reviewed but these two points are concerning:
All six probes originally used the OBOE_SQLSANITIZE_KEEPDOUBLE option. There no such "option" anymore (again following ruby agent lead). SQL Sanitizer implementation always keeps double quoted data. The is no user customization of the sanitization function (departing from ruby agent lead).
Our spec for sanitization https://github.com/librato/trace/blob/master/docs/specs/sql-sanitization.md has the rationale why there is the option to treat double rather than single quote values as sanitization targets. Quick look in OTel Java they dealt with the same concern and this PR has some good examples: https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/4593
With our Ruby agent the ability to customize the regex could provide a workaround (although ideally the agent should offer a configurable alternatative), but if we remove this for the Node.js agent there would be no recourse.
Side note, our spec has links to some original discussion on why FSM vs. regex, afaict it was to not require multiple regex passes. But it was also unclear whether a custom FSM would be more performant, and given the maintenance/readability trade off I think we can consider multiple regex passes if needed.
I haven't reviewed but these two points are concerning:
It's easy.
From a user perspective - Since the day SQL instrumentation was created, 8 years ago, and up to today, the node agent: 1) kept double quoted values; 2) provided no user configuration of the sanitizer; 3) could not sanitize values where an apostrophe was used. This pull request changed only number 3 above. This PR could be considered a minor fix.
From implementation perspective - it changes something unmaintainable to something that is (so if there ever is any future user concern it could be addressed). This PR could be considered minor maintainability improvement.
From a performance perspective - I very much doubt that a call to the bindings + the FSR is faster than a simple multi regex in JS and I don't think it is wise to spend time on proving either or. This PR could be considered as not having performance effect.
I don't really see any reason for concern (other than in archeological terms), but if it is, let's "Just NOT Do It!" Either this PR, and https://github.com/appoptics/appoptics-bindings-node/pull/117, are merged, or, they are closed. Any further discussion of this feature is moot and a time well wasted.
Overview
This pull request adds SQL Sanitization functionality to the agent.
Status
SQL sanitazation (in this context) is the removal of user data from query strings (e.g. removing
Jake
and13
fromSELECT * FROM users WHERE name = 'Jake' AND age > 13
).Sanitization was added to the agent via the bindings. It is an FSM written in C++ "copied" from PHP agent.
The Ruby Agent uses a simple regex to achieve similar (but not identical) results.
Change
An SQL Sanitizer module that uses a regex expressions to "sanitize" data out of strings (assumed to be SQL queries) was added. Implementation is similar to the one used in the ruby agent, but handles removal of data in more edge cases.
Sanitization via regex (following ruby agent lead) differs from what the original implementation was. Removing
Jake
and13
fromSELECT * FROM users WHERE name = 'Jake' AND age > 13
will now result inSELECT * FROM users WHERE name = ? AND age > ?
when previously it resulted inSELECT * FROM users WHERE name = '?' AND age > 0
.The six probes that use sanitization (
cassandra-driver
,node-cassandra-cql
,mysql
,oracledb
,pg
,tedious
) were modified to use internal SQL Sanitizer module.All six probes originally used the
OBOE_SQLSANITIZE_KEEPDOUBLE
option. There no such "option" anymore (again following ruby agent lead). SQL Sanitizer implementation always keeps double quoted data.The is no user customization of the sanitization function (departing from ruby agent lead).
Notes