openai / openai-dotnet

The official .NET library for the OpenAI API
https://www.nuget.org/packages/OpenAI
MIT License
707 stars 60 forks source link

String enum types have broken GetHashCodes #49

Open stephentoub opened 2 weeks ago

stephentoub commented 2 weeks ago

Repro:

using OpenAI.Chat;

ChatToolCallKind value1 = new("value1");
ChatToolCallKind VALUE1 = new("VALUE1");

Console.WriteLine(value1.Equals(VALUE1)); // prints true as expected

Console.WriteLine(value1.GetHashCode());
Console.WriteLine(VALUE1.GetHashCode()); // should print same value as above, but doesn't

HashSet<ChatToolCallKind> set = [value1];
Console.WriteLine(set.Contains(VALUE1)); // should print true but prints false

There are over 100 types, most of them generated, with a definition of equality like this:

        [EditorBrowsable(EditorBrowsableState.Never)]
        public override bool Equals(object obj) => obj is VectorStoreBatchFileJobStatus other && Equals(other);
        public bool Equals(VectorStoreBatchFileJobStatus other) => string.Equals(_value, other._value, StringComparison.InvariantCultureIgnoreCase);

        [EditorBrowsable(EditorBrowsableState.Never)]
        public override int GetHashCode() => _value?.GetHashCode() ?? 0;

The rules for GetHashCode state:

If two objects compare as equal, the GetHashCode() method for each object must return the same value.

but here, equality is based on a case-insensitive comparison but hash code is being computed based on case-sensitive. That means two equal values can end up with different hashcodes. That in turn breaks things like dictionary/set lookups, which rely on hash codes for bucketing, and thus even if there's an equal value in the dictionary, it's likely not to be found because of differing hashcodes.