Closed jgilbert2017 closed 3 years ago
i've done my best to test this code and it seems to work
I just added some explanatory comments on why we create certificates without a guaranteed concurrency in below commit. https://github.com/justcoding121/titanium-web-proxy/commit/2470c0a13316e096369319d0ee1bd98ef2ef7c0b
It should also resolve the SaveCertificate issue. Closing this. But thank you for your contributions.
locking in SaveCertificate suffers from the problem of allowing simultaneous duplicate certificate creation. multiple threads can be generating redundant server certificates (5 seconds of CPU time on my machine).
your locking pattern also requires 2 locks- the ConcurrentDictionary.TryAdd and the lock() statement.
please reconsider my patch. i does not have the issue you raised.
===
also- you have a comment about saving an expensive lock. i don't think that is the case. if the code is checking the pending cert creation tasks dictionary then it is almost certainly going to have to wait for certificate creation or disk io (if reading from disk).
I apologize if this was closed prematurely.
Let's discuss below piece of code in your PR.
Task<X509Certificate2?> createCertificateTask;
await pendingCertificateCreationTaskLock.WaitAsync();
try {
// handle burst requests with same certificate name
// by checking for existing task for same certificate name
if (pendingCertificateCreationTasks.TryGetValue(certificateName, out createCertificateTask))
{
return await createCertificateTask;
}
// run certificate creation task & add it to pending tasks
createCertificateTask = Task.Run(() =>
{
var result = CreateCertificate(certificateName, false);
if (result != null)
{
cachedCertificates.TryAdd(certificateName, new CachedCertificate(result));
}
return result;
});
pendingCertificateCreationTasks[certificateName] = createCertificateTask;
} finally {
pendingCertificateCreationTaskLock.Release();
}
// create certificate outside of lock to allow other threads a chance to see the pending task.
// precludes the possiblility of multiple threads simultaneously generating the same certificate,
// we tolerate the case of a duplicate sequential generation due to a race (should be rare)
// t1: await createCertificateTask
// t2: cachedCertificates.TryGetValue->false
// t1: pendingCertificateCreationTasks.Remove
// t2: pendingCertificateCreationTasks.TryGetValue->false
var certificate = await createCertificateTask;
// cleanup pending task
await pendingCertificateCreationTaskLock.WaitAsync();
try {
pendingCertificateCreationTasks.Remove(certificateName);
} finally {
pendingCertificateCreationTaskLock.Release();
}
pendingCertificateCreationTaskLock
is a global lock. I am worried about the statement return await createCertificateTask
.
Let's imaging there are 5 hosts requesting new certificates, with each have a second request in burst (total 10 request together).
Let's assume each certificate creation takes 500 ms.
await createCertificateTask
for 500 ms. Meanwhile other 8 requests are waiting.await createCertificateTask
for 500 ms. Meanwhile other 6 requests are waiting.await createCertificateTask
for 500 ms. Meanwhile other 4 requests are waiting....
I could be wrong. You see the problem here. The global lock becomes a bottleneck and makes the process non-parellel. I see your point about CPU usage spike with duplicate certificate generation with existing approach. But existing approach won't have the bottleneck above.
If all requests are for the same host, then your approach have an advantage. Even then, existing approach worst case is unlikely, because we would expect a few milliseconds between requests for same host, giving time for the 1st creation task to be in the task dictionary. I am not sure if the pros of your approach outweighs the cons in practice. I think in practice, we expect to see many hosts, especially if proxy is handling requests from multiple client machines on internet.
Also, the save certificate to disk is not part of request handling. I offloaded it to a separate task in my latest change. We can prevent the additional write to disk and save CPU resource by just checking if the file already exists inside the DiskCache implementation, just before write operation.
you're right, that is a mistake. i imagined that was happening outside the lock. gimme a minute to push a new patch candidate.
Also, the save certificate to disk is not part of request handling. I offloaded it to a separate task in my latest change. We can prevent the additional write to disk and save CPU resource by just checking if the file already exists inside the DiskCache implementation, just before write operation.
That's probably a good idea. Just make sure you've got try/catch and the file writing is atomic since there will still be a race.
bah... screwed up rushing... gimme another sec to re-push
please review, i think the patch is good now.
:+1::+1::+1: btw, kudos on outstanding work on this project over the last few years.
I still see a rare possibility for multiple tasks still creating same certificates as described in following example.
In this example Request A and C are for same host abc.com. But the certificates are created twice. But I think it won't happen very often in practice.
Checking the cache one more time inside the 1st critical section should fix this issue.
I just added that check, one more lock acquire inside TryGetValue for cache dictionary. But I think it should be fast enough. Thanks for this PR.
I just added that check, one more lock acquire inside TryGetValue for cache dictionary. But I think it should be fast enough. Thanks for this PR.
i had considered doing this but for some reason i was worried about slowing down the non-cert generation code path. but i don't see that as an issue when i think about it now.
I just added that check, one more lock acquire inside TryGetValue for cache dictionary. But I think it should be fast enough. Thanks for this PR.
i think you may want to remove my comment that discusses a possible race. i believe your additional check addresses that scenario.
can still happen sequentially but should not cause exceptions. see #816.