microsoft / CCF

Confidential Consortium Framework
https://microsoft.github.io/CCF/
Apache License 2.0
766 stars 207 forks source link

JWT issuer validation #6175

Closed maxtropets closed 5 days ago

maxtropets commented 1 month ago

Eventually closes

Decided to merge together to avoid being split into different releases

maxtropets commented 2 weeks ago

As noted above, kept the old tables but moved them under "Legacy" namespace. We can work with them if needed but kept them explicitly obsolete.

maxtropets commented 2 weeks ago

Converted back to "Draft" to include #5177 and #6204

github-actions[bot] commented 2 weeks ago

🐰Bencher

ReportSun, June 2, 2024 at 22:24:52 UTC
ProjectCCF
Branch6175/merge
Testbedgha-virtual-ccf-sub

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!

  • Latency (latency)

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results
BenchmarkLatencyLatency Results
nanoseconds (ns)
commit_latency_ratio➖ (view plot)0.76
historical_queries➖ (view plot)60,805.62
pi_basic_js_virtual➖ (view plot)4,415.90
pi_basic_virtual➖ (view plot)57,689.70

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help
maxtropets commented 1 week ago

I've run jwt perf test on main VS branch

./tests.sh -VV -R pi_ls_jwt -L "perf"
python3 tests/infra/piccolo/throughput_analysis.py # patched with proper paths

Results main

(env) mtropets@mtropets:~/workspace/CCF$ python3 tests/infra/piccolo/throughput_analysis.py
+------+-------+------+------------+
| Reqs |  Time | Pass | Throughput |
+------+-------+------+------------+
| 1000 | 0.046 | 0.0  |  21914.4   |

Result branch

python3 tests/infra/piccolo/throughput_analysis.py 
+------+-------+------+------------+
| Reqs |  Time | Pass | Throughput |
+------+-------+------+------------+
| 1000 | 0.064 | 0.0  |  15716.4   |
+------+-------+------+------------+

Seems like a pretty decent performance hit, will dig up

maxtropets commented 1 week ago

I've run jwt perf test on main VS branch

./tests.sh -VV -R pi_ls_jwt -L "perf"
python3 tests/infra/piccolo/throughput_analysis.py # patched with proper paths

Results main

(env) mtropets@mtropets:~/workspace/CCF$ python3 tests/infra/piccolo/throughput_analysis.py
+------+-------+------+------------+
| Reqs |  Time | Pass | Throughput |
+------+-------+------+------------+
| 1000 | 0.046 | 0.0  |  21914.4   |

Result branch

python3 tests/infra/piccolo/throughput_analysis.py 
+------+-------+------+------------+
| Reqs |  Time | Pass | Throughput |
+------+-------+------+------------+
| 1000 | 0.064 | 0.0  |  15716.4   |
+------+-------+------+------------+

Seems like a pretty decent performance hit, will dig up

However, I don't know how heavy is the typical TX. In the test, we do one append per TX, which might differ from the real-world scenario, and so, I have no clue about the real impact of 25% slower JWT authentication.

Meaning, if the auth itself is 10x faster than the request itself - it doesn't really matter, otherwise it might matter.

maxtropets commented 1 week ago

One more perf update

Removed issuer validation, but left new schema in place. Results

(env) mtropets@mtropets:~/workspace/CCF$ python3 tests/infra/piccolo/throughput_analysis.py 
+------+-------+------+------------+
| Reqs |  Time | Pass | Throughput |
+------+-------+------+------------+
| 1000 | 0.058 | 0.0  |  17306.4   |
+------+-------+------+------------+

Most likely, the perf impact is 25%=10%+15%

achamayou commented 1 week ago

@maxtropets you need to rev up the openapi version of the gov schema, run the test again locally, and check in the updated openapi golden file to get it to pass.

25% hit on end-to-end throughput is substantial, but will be offset to some extent by the application being able to remove some of that logic. It would be good to run the same test (full logging) with jwt removed, to compare (the closest we have is basic, but that's a smaller app, and clocks about ~57kTx/s on the CI). It would also be good to run with a profiler to see what exactly makes this worse.

But because this change is a correctness fix, I think we merge it now, and investigate separately if there is a way to improve the performance. Ideally we do this now, and figure out quickly if further schema changes are needed or not. Improvements that don't impact the schema can happen in later releases.

achamayou commented 1 week ago

@maxtropets ah one thing we need on this PR is a proper CHANGELOG entry, explaining the change to application developers, and service operators, with details about actions they need to take.

maxtropets commented 1 week ago

the closest we have is basic, but that's a smaller app

@achamayou aren't they actually the same test? Seems like the only difference is JWT auth.

achamayou commented 6 days ago

the closest we have is basic, but that's a smaller app

@achamayou aren't they actually the same test? Seems like the only difference is JWT auth.

No, there are two subtle differences. One is that the logging API involves JSON wrappers, with extra (de)serialisation/copies. The other is that the logging js app is fairly large (it has lots of endpoints) and so involves loading quite a bit more bytecode at every execution. So while it is logically doing the same thing, it's running quite a bit slower.