justcoding121 / titanium-web-proxy

A cross-platform asynchronous HTTP(S) proxy server in C#.
MIT License
1.93k stars 618 forks source link

fix simultaneous duplicate server certificate generation #854

Closed jgilbert2017 closed 3 years ago

jgilbert2017 commented 3 years ago

can still happen sequentially but should not cause exceptions. see #816.

jgilbert2017 commented 3 years ago

i've done my best to test this code and it seems to work

justcoding121 commented 3 years ago

I just added some explanatory comments on why we create certificates without a guaranteed concurrency in below commit. https://github.com/justcoding121/titanium-web-proxy/commit/2470c0a13316e096369319d0ee1bd98ef2ef7c0b

It should also resolve the SaveCertificate issue. Closing this. But thank you for your contributions.

jgilbert2017 commented 3 years ago

locking in SaveCertificate suffers from the problem of allowing simultaneous duplicate certificate creation. multiple threads can be generating redundant server certificates (5 seconds of CPU time on my machine).

your locking pattern also requires 2 locks- the ConcurrentDictionary.TryAdd and the lock() statement.

please reconsider my patch. i does not have the issue you raised.

===

also- you have a comment about saving an expensive lock. i don't think that is the case. if the code is checking the pending cert creation tasks dictionary then it is almost certainly going to have to wait for certificate creation or disk io (if reading from disk).

justcoding121 commented 3 years ago

I apologize if this was closed prematurely.

justcoding121 commented 3 years ago

Let's discuss below piece of code in your PR.

Task<X509Certificate2?> createCertificateTask;
await pendingCertificateCreationTaskLock.WaitAsync();
try {
    // handle burst requests with same certificate name
    // by checking for existing task for same certificate name
    if (pendingCertificateCreationTasks.TryGetValue(certificateName, out createCertificateTask))
    {
        return await createCertificateTask;
    }

    // run certificate creation task & add it to pending tasks
    createCertificateTask = Task.Run(() =>
    {
        var result = CreateCertificate(certificateName, false);
        if (result != null)
        {
            cachedCertificates.TryAdd(certificateName, new CachedCertificate(result));
        }

        return result;
    });
    pendingCertificateCreationTasks[certificateName] = createCertificateTask;
} finally {
    pendingCertificateCreationTaskLock.Release();
}

// create certificate outside of lock to allow other threads a chance to see the pending task.
// precludes the possiblility of multiple threads simultaneously generating the same certificate,
// we tolerate the case of a duplicate sequential generation due to a race (should be rare)
// t1: await createCertificateTask
// t2: cachedCertificates.TryGetValue->false
// t1: pendingCertificateCreationTasks.Remove
// t2: pendingCertificateCreationTasks.TryGetValue->false
var certificate = await createCertificateTask;

// cleanup pending task
await pendingCertificateCreationTaskLock.WaitAsync();
try {
    pendingCertificateCreationTasks.Remove(certificateName);
} finally {
    pendingCertificateCreationTaskLock.Release();
}

pendingCertificateCreationTaskLock is a global lock. I am worried about the statement return await createCertificateTask. Let's imaging there are 5 hosts requesting new certificates, with each have a second request in burst (total 10 request together). Let's assume each certificate creation takes 500 ms.

  1. 1st host acquires the global lock at 0 ms
  2. 1st host' burst request waits at return await createCertificateTask for 500 ms. Meanwhile other 8 requests are waiting.
  3. 1st host and its burst returns at 500 ms, lock released.
  4. 2nd host acquires the global lock at 500 ms
  5. 2nd host' burst request waits at return await createCertificateTask for 500 ms. Meanwhile other 6 requests are waiting.
  6. 2nd host and its burst returns at 1000 ms, lock released.
  7. 3rd host acquires the global lock at 1000 ms
  8. 3rd host' burst request waits at return await createCertificateTask for 500 ms. Meanwhile other 4 requests are waiting.
  9. 3rd host and its burst returns at 1500 ms, lock released.

...

I could be wrong. You see the problem here. The global lock becomes a bottleneck and makes the process non-parellel. I see your point about CPU usage spike with duplicate certificate generation with existing approach. But existing approach won't have the bottleneck above.

If all requests are for the same host, then your approach have an advantage. Even then, existing approach worst case is unlikely, because we would expect a few milliseconds between requests for same host, giving time for the 1st creation task to be in the task dictionary. I am not sure if the pros of your approach outweighs the cons in practice. I think in practice, we expect to see many hosts, especially if proxy is handling requests from multiple client machines on internet.

justcoding121 commented 3 years ago

Also, the save certificate to disk is not part of request handling. I offloaded it to a separate task in my latest change. We can prevent the additional write to disk and save CPU resource by just checking if the file already exists inside the DiskCache implementation, just before write operation.

jgilbert2017 commented 3 years ago

you're right, that is a mistake. i imagined that was happening outside the lock. gimme a minute to push a new patch candidate.

jgilbert2017 commented 3 years ago

Also, the save certificate to disk is not part of request handling. I offloaded it to a separate task in my latest change. We can prevent the additional write to disk and save CPU resource by just checking if the file already exists inside the DiskCache implementation, just before write operation.

That's probably a good idea. Just make sure you've got try/catch and the file writing is atomic since there will still be a race.

jgilbert2017 commented 3 years ago

bah... screwed up rushing... gimme another sec to re-push

jgilbert2017 commented 3 years ago

please review, i think the patch is good now.

jgilbert2017 commented 3 years ago

:+1::+1::+1: btw, kudos on outstanding work on this project over the last few years.

justcoding121 commented 3 years ago

I still see a rare possibility for multiple tasks still creating same certificates as described in following example.

  1. Request A for abc.com acquires lock and awaits for generation to finish
  2. Another burst request C for abc.com waits to acquire lock. Many other requests are competing to acquire lock and some are awaiting for unrelated hosts, so request C has to wait for some time.
  3. Request A for abc.com releases lock and removes the task from task creation dictionary
  4. Request C for abc.com finally acquires lock and awaits for generation to finish
  5. Request C for abc.com releases lock and removes the task from task creation dictionary

In this example Request A and C are for same host abc.com. But the certificates are created twice. But I think it won't happen very often in practice.

Checking the cache one more time inside the 1st critical section should fix this issue.

justcoding121 commented 3 years ago

I just added that check, one more lock acquire inside TryGetValue for cache dictionary. But I think it should be fast enough. Thanks for this PR.

jgilbert2017 commented 3 years ago

I just added that check, one more lock acquire inside TryGetValue for cache dictionary. But I think it should be fast enough. Thanks for this PR.

i had considered doing this but for some reason i was worried about slowing down the non-cert generation code path. but i don't see that as an issue when i think about it now.

jgilbert2017 commented 3 years ago

I just added that check, one more lock acquire inside TryGetValue for cache dictionary. But I think it should be fast enough. Thanks for this PR.

i think you may want to remove my comment that discusses a possible race. i believe your additional check addresses that scenario.