ra0o0f / arangoclient.net

ArangoDB .NET Client with LINQ support
Apache License 2.0
99 stars 37 forks source link

Thread safety #118

Open imtrobin opened 5 years ago

imtrobin commented 5 years ago

Hi, I'm using arangoclient in azure function (serverless), I see the test using Lazy to initialize, and even using ConcurrentDictionary, but I notice DatabaseSharedSetting.ChangeSetting/FindSetting is not thread safe.

private static ConcurrentDictionary<string, DatabaseSharedSetting> cachedSettings = new ConcurrentDictionary<string, DatabaseSharedSetting>();

       static DatabaseSharedSetting FindSetting(string identifier, bool? throwIfNotFound = false)
        {
            if (string.IsNullOrWhiteSpace(identifier))
                throw new ArgumentNullException("Setting identifier");

            DatabaseSharedSetting setting = null;
            if (!cachedSettings.TryGetValue(identifier, out setting))
            {
                if (throwIfNotFound == true)
                    throw new InvalidOperationException(string.Format("Can not find database setting identifier '{0}'", identifier));

                setting = new DatabaseSharedSetting();
                setting.SettingIdentifier = identifier;
                cachedSettings.TryAdd(identifier, setting);
            }
            return setting;
        }
ra0o0f commented 5 years ago

@imtrobin ArangoDatabase is not meant to be ThreadSafe, you should create new instance for each request(or few request)

  1. ArangoDatabase is responsible for change tracking, so if you create only one instance. its change tracking store will become bigger and bigger

  2. creating ArangoDatabase instance is a cheap operation. no client/db initialization will happen

ra0o0f commented 5 years ago

@imtrobin why ChangeSetting/FindSetting is not thread safe?

imtrobin commented 5 years ago

I saw your other thread, that the advice is to keep disposing and recreating new instance. I dug briefly in the code, I see your have one static httpclient shared among all db connection, so I think that is good?

FindSetting is used in ArangoDatabase.CreateWithSetting, and it should lock when it is tryin to get/add to cacheSettings, no?

ra0o0f commented 5 years ago

@imtrobin for each httpclient instance an underlying tcp connection will be used, therefore if we use multiple httpclient instance then multiple tcp handshake will happen for creating new tcp connection. so i think have an httpclient for all requests make sense.

if multiple thread invoke TryAdd with the same key, one of them will succeed and other fails(no exception thrown), i dont think using lock would bring any thread safety here.

imtrobin commented 5 years ago

yes, one httpclient is correct way to avoid socket exhaustion. Though I think it does not work as well for multiple arangodb servers, but I think that's not your design here, right?

The multiple thread problem is with Try/Get, another guy can enter before TryAdd is complete.

        static DatabaseSharedSetting FindSetting(string identifier, bool? throwIfNotFound = false)
        {
            if (string.IsNullOrWhiteSpace(identifier))
                throw new ArgumentNullException("Setting identifier");

            DatabaseSharedSetting setting = null;
            if (!cachedSettings.TryGetValue(identifier, out setting))
            {
                 // SECOND COMES COMES HERE BEFORE TRYADD
                if (throwIfNotFound == true)
                    throw new InvalidOperationException(string.Format("Can not find database setting identifier '{0}'", identifier));
                // FIRST GUY COMES THROUGH HERE
                setting = new DatabaseSharedSetting();
                setting.SettingIdentifier = identifier;
                cachedSettings.TryAdd(identifier, setting);
            }
            return setting;
        }
ra0o0f commented 5 years ago

@imtrobin i think one http client could also works for multiple enpoints(performance wise) https://stackoverflow.com/a/39733434

but having a http client for each arangodb server is a good idea , since it could benefit from having separate configs which could be only set once for all requests(like proxy).


if first and second guy trying to add different identifier as keys, then there is no problem. but if they add same identifier as key, then first guy will succeed and the second one will fail with no exception thrown. is it not acceptable?

imtrobin commented 5 years ago

The FindSetting is used in other places, so if those two calls are generated at the same time (by serverless architecture), then the second call gets a different instance of DatabaseSharedSetting instead of the instance in the cachedSettings. By logic, I would assume I would get the same instance in memory. Although the code doesn't break, but it's sure hard while debugging cos the memory instance are different which is not obvious. Imagine if I kept both returned instances and do other stuff, oh boy.