Open dominictarr opened 3 years ago
Based on the benchmarks in #146, the suggested change here doesn't appear to improve slowEqual
performance. This may differ between NodeJS versions, though.
I don't trust createSeekPath
because it uses Function
constructor with strings, and has security issues similar to eval()
. Given that our current slowEqual
is for prototyping (maybe its name should have been slowInsecureEqual
, but now is too late to issue a breaking change) and equal
is for production, and that it's not hard to hand-write args for equal
, I don't see a reason why we should speed up slowEqual
.
If someone's passing user input as slowEqual
's first arg, then they probably have a lot of other problems in their application. I see what you mean though, it escalates the risk beyond accessing unauthorized paths to potentially crashing the application or exploiting any JSON.stringify
bugs for RCE.
I disagree. I think this code is safe because Buffer.from
throws if the input is not a string, ArrayBuffer, or Array of numbers. ArrayBuffer is never returned by JSON.stringify. null, true, etc will return a seekPath
that is throws. We can trust JSON.stringify's escaping. Sure, this would be vulnerable if JSON.stringify had a bug... but JSON.stringify is trustworthy, 100% specified and solid.
they can't do insert a comment then real code after it, in the style of SQL injection.
should add a check that every item in target is a string,
or if you wanna be truely paranoid, enforce that keys are /^[a-zA-Z_][a-zA-Z_0-9]+$/
then surround with quotes.
then it would be impossible to write js except a variable name.
@dominictarr It would only take a JSON.stringify
exploit that allows a self-invoking function to circumvent the Buffer.from
call, and JSON.stringify
has had exploits like this before. I would have considered it overprotective since the dev using this library would have to trust bad inputs already for this exploit, but the point seems moot since the refactor doesn't appear to deliver any performance improvement in NodeJS 12 (and I bet it will fare worse in later versions).
Also - if you follow the CVE link, you may find one of the mentioned pages specifically mentions the JSON.stringify
exploit in relation to RCE.
to make that bug actually work, the stringify needs to be running with a reviver function. which this one isn't. or, I think if you passed it an object with a custom toValue method, but that can't be done remotely (only in the local js vm)
to attack this you'd need to get JSON.stringify to actually output the wrong thing. slowEqual
takes a single string and splits on '.' ... then passes the result to createSeekPath it has to get an array of strings... that means you'd need an attack string that causes JSON.stringify to output a string that is incorrectly escaped
but yes, this doesn't matter unless it's actually faster or not... I was confused about the PR, it seemed to say that the benchmarks didn't use createSeekPath?
the docs say that slowEqual is slow... but is there a benchmark that shows that? maybe slow equal isn't slow? or there are other things that make the impact of slow equal insignificant, but when I was developing this stuff I recall this stuff had significant impact
@dominictarr I setup two pull requests in my fork to trigger the benchmark, and slowEqual
performed more slowly than equal
both with and without this change.
is one of those results the hand written query? is it "Query 1 big index"? how big is the log? are there records that match and also do not match?
btw, why are you putting it inside the if that might return undefined? it returns -1 to mean not found because this can be checked with a single operator, and doesn't change return types. When you have a polymorphic function it isn't as optimizable, so better to avoid this. Also, createSeekPath tries to avoid using closures then you added one back ;)
now, js perf can be very finnicky and the engine optimizations might have changed since I wrote that, you have to actually measure this... so I could be wrong...
but in bipf/test/perf the difference between seekPath and createSeekPath does make a difference (createSeekPath is faster) https://github.com/ssbc/bipf/blob/master/test/perf.js#L136-L154 so it's slower in this benchmark there must be something else going on...
@dominictarr Looks like the difference is that the JITDB benchmark factors the compilation time into the benchmark & JITDB wasn't caching the compiled code. If we update JITDB to cache/memoize bipf.createSeekPath
, then the second and third runs may show better performance.
@dominictarr I introduced a static module-level LRU cache optimization to slowEquals
so we can better gauge the impact of bipf.createSeekPath
on performance. To make sure it's a fair benchmark, I also included the LRU optimization on the change w/o bipf.createSeekPath
.
Part | Speed | Heap Change | Samples |
---|---|---|---|
Query 1 big index with slowEqual (1st run) | 1422.81ms ± 54.09ms | -24.07 kB ± 0 kB | 38 |
Query 1 big index with slowEqual (2nd run) | 289.3ms ± 2.74ms | 4.19 kB ± 0 kB | 42 |
Query 1 big index with slowEqual (3rd run) | 0.31ms ± 0.05ms | 3.07 kB ± 18.14 kB | 48 |
bipf.createSeekPath
Part | Speed | Heap Change | Samples |
---|---|---|---|
Query 1 big index with slowEqual (1st run) | 915.94ms ± 6.73ms | 8.63 kB ± 0 kB | 59 |
Query 1 big index with slowEqual (2nd run) | 319.64ms ± 3.36ms | 68.74 kB ± 38.55 kB | 42 |
Query 1 big index with slowEqual (3rd run) | 0.5ms ± 0.18ms | 10.55 kB ± 24.38 kB | 44 |
It looks like bipf.createSeekPath
makes the first query faster, but not later queries. Note though that these numbers completely exclude the compilation time, which would be a serious consideration for very dynamic paths in the queries.
I just noticed that the original and createSeekPath
share a problem - they repeatedly generate the comparison buffers. I created another branch to test a safe solution that beats both in performance.
Part | Speed | Heap Change | Samples |
---|---|---|---|
Query 1 big index with slowEqual (1st run) | 864.28ms ± 18.35ms | 13.64 kB ± 0 kB | 62 |
Query 1 big index with slowEqual (2nd run) | 321.48ms ± 2.07ms | 16.15 kB ± 34.04 kB | 41 |
Query 1 big index with slowEqual (3rd run) | 0.36ms ± 0.09ms | 5.09 kB ± 0 kB | 49 |
@arj03 @staltz What are your thoughts on this solution?
@Barbarrosa great job!
which benchmark is equals? is slowEquals as fast as that now?
@dominictarr slowEquals
seems much more comparable in speed using the LRU and pregenerated buffer. Here are the benchmarks that use equals
:
Part | Speed | Heap Change | Samples |
---|---|---|---|
Query 1 big index (1st run) | 912.47ms ± 24.95ms | 21.11 kB ± 0 kB | 59 |
Query 1 big index (2nd run) | 303.86ms ± 2.07ms | 10.73 kB ± 0 kB | 44 |
Count 1 big index (3rd run) | 0.28ms ± 0.02ms | 23.67 kB ± 0 kB | 45 |
Thinking back, Query 1 big index with slowEqual (3rd run)
should really be Count 1 big index with slowEqual (3rd run)
.
I think two more benchmarks may be worth looking at here.
slowEquals
Part | Speed | Heap Change | Samples |
---|---|---|---|
Query 1 big index with slowEqual (1st run) | 1227.72ms ± 30.83ms | -43.01 kB ± 18.86 kB | 45 |
Query 1 big index with slowEqual (2nd run) | 302.25ms ± 1.66ms | 77.83 kB ± 0 kB | 40 |
Query 1 big index with slowEqual (3rd run) | 0.29ms ± 0.02ms | 21.15 kB ± 0 kB | 44 |
Part | Speed | Heap Change | Samples |
---|---|---|---|
Query 1 big index with slowEqual (1st run) | 1253.71ms ± 24.5ms | 32.54 kB ± 45.04 kB | 43 |
Query 1 big index with slowEqual (2nd run) | 307.64ms ± 2.64ms | -15.19 kB ± 12.49 kB | 43 |
Query 1 big index with slowEqual (3rd run) | 0.62ms ± 0.32ms | 7.5 kB ± 43.83 kB | 42 |
I was curious about the impact if the LRU cache was set per JITDB instance. Here's that result.
Part | Speed | Heap Change | Samples |
---|---|---|---|
Query 1 big index with slowEqual (1st run) | 1443.42ms ± 7.46ms | -18.06 kB ± 30.34 kB | 37 |
Query 1 big index with slowEqual (2nd run) | 286.96ms ± 2.86ms | 74.54 kB ± 13.3 kB | 45 |
Query 1 big index with slowEqual (3rd run) | 0.48ms ± 0.21ms | 14.76 kB ± 32.07 kB | 43 |
how big is the log? are there records that match and also do not match?
@dominictarr It's querying for msgs of type post, there are 23310 of those and the log in total has 100000 msgs and 2000 feeds. This log is deterministically generated with ssb-fixtures which uses statistical sampling based on real-world SSB statistics and the Pareto distribution.
This log is deterministically generated with ssb-fixtures which uses statistical sampling based on real-world SSB statistics and the Pareto distribution.
perfect!
@dominictarr https://github.com/ssb-ngi-pointer/jitdb/pull/154 (now merged) implements something similar to the change I suggested, i.e. it preprocesses the queried path into an array of Buffer
s and caches the resulting function with a Map
. This means related performance issues w/ slowEqual
will solely affect the first query using a particular path. Any following queries will see performance much more like that of equal
.
there is a method in bipf to generate the createSeekPath method
https://github.com/ssbc/bipf/blob/master/index.js#L299-L317
it generates source code and creates a new function because it can generate flatter, faster, code without closure scoping.
then you won't need the indexType option, because you can just use the path.