yasirkula / UnityIngameDebugConsole

A uGUI based console to see debug messages and execute commands during gameplay in Unity
MIT License
2.11k stars 221 forks source link

Changed DebugLogEntry's hashing algorithm to MD5… #42

Closed Dinno closed 2 years ago

Dinno commented 3 years ago

… and use MD5 hashes not only in GetHashCode method but also in Equals

A had the Equals method as the top hot spot in my profiler (I used debug logging extensively) and after this change the hot spot disappeared.

yasirkula commented 3 years ago

Why are you using MD5 hash and not the int hashValue that I'm calculating?

Dinno commented 3 years ago

Because I use it in Equals method and I want to minimize chances of conflict.

yasirkula commented 3 years ago

I'd like to know how many milliseconds the current Equals method consumed in Profiler and on which OS because comparing just a hash value in Equals will always have a slim chance of false positives and this change must be worth it to excuse this downside.

Dinno commented 3 years ago

With MD5 algorithm those chances are so slim, that one has much higher chances to be hit by a meteorite than to deal with the results of such conflict. We used MD5 to compare files in a corporate production setting and we had zero conflicts while having millions of potentially conflicting files.

Dinno commented 3 years ago

...as about hot spot, I will try to reproduce those results and send them to you.

Dinno commented 3 years ago

Didn't need to reproduce. I found saved profiling results image image

Dinno commented 3 years ago

MD5 version hot spots: image

Dinno commented 3 years ago

Forgot to send summary. It could be important for comparison image

yasirkula commented 3 years ago

I don't like a few things about this solution:

The first two points can have a bigger impact on mobile platforms when there are a lot of logs per seconds.

What do you think about this alternative (assume it changes the unaltered DebugLogEntry.cs, i.e. without MD5 hashing):

#define FASTER_EQUALS_CHECK

...

#if FASTER_EQUALS_CHECK
private const int EQUALS_NUMBER_OF_CHAR_COMPARISON = 5;
#endif

public void Initialize( string logString, string stackTrace )
{
    this.logString = logString ?? "";
    this.stackTrace = stackTrace ?? "";

    ...
}

...

public bool Equals( DebugLogEntry other )
{
#if !FASTER_EQUALS_CHECK
    return this.logString == other.logString && this.stackTrace == other.stackTrace;
#else
    if( GetHashCode() != other.GetHashCode() )
        return false;

    string otherLogString = other.logString;
    int logStringLength = logString.Length;
    if( logStringLength != otherLogString.Length )
        return false;

    string otherStackTrace = other.stackTrace;
    int stackTraceLength = stackTrace.Length;
    if( stackTraceLength != otherStackTrace.Length )
        return false;

    if( logStringLength > 0 )
    {
        logStringLength /= EQUALS_NUMBER_OF_CHAR_COMPARISON;
        for( int i = 0; i < EQUALS_NUMBER_OF_CHAR_COMPARISON; i++ )
        {
            int charIndex = i * logStringLength;
            if( logString[charIndex] != otherLogString[charIndex] )
                return false;
        }
    }

    if( stackTraceLength > 0 )
    {
        stackTraceLength /= EQUALS_NUMBER_OF_CHAR_COMPARISON;
        for( int i = 1; i < EQUALS_NUMBER_OF_CHAR_COMPARISON; i++ ) // i starts from 1: stack traces start with the word "at", no need to check the first character
        {
            int charIndex = i * stackTraceLength;
            if( stackTrace[charIndex] != otherStackTrace[charIndex] )
                return false;
        }
    }
#endif
}
Dinno commented 3 years ago

This solution will be better. But it has corner case when most of your strings are the same. In that case it will not improve over old algorithm. And I think that my case is exactly such corner case. May be we will still see improvements though. We can try to profile this solution if you want.

Dinno commented 3 years ago

Why do you think that Linq is slow?

yasirkula commented 3 years ago

But it has corner case when most of your strings are the same. In that case it will not improve over old algorithm.

Only 5 characters from logString and stackTrace are compared; hence I think it will be much faster than the old algorithm. And since these strings' hash values and lengths are also compared, practically this method shouldn't generate any false positives.

I've seen threads about Linq's performance countless times and I prefer not using it at all in my plugins. For example:

P.S. It'd be awesome if you could profile the solution I've suggested 😃

Dinno commented 3 years ago

"Only 5 characters from logString and stackTrace are compared" I missed this detail. Sorry. Should be pretty good solution then. I would say that it has much higher potential for conflicts than MD5 variant but I think that it will be good enough.

Dinno commented 3 years ago

I understand that you already decided what to do. But I still want to express ideas of improvements for MD5 variant :)

  1. I found that we can build MD5 hashes incrementally. This will get rid of garbage strings.
  2. We can incrementally and manually translate strings to bytes using constant size byte buffer. This will get rid of intermediate managed array of bytes.
  3. We can store MD5 hashes in Unity's fixed sized structures or just use fixed byte[16] (it is unsafe though). That will get rid of byte[] managed garbage.
  4. If you think that 16 bytes is too much than we can store just first 8. Even 8 bytes will give us pretty good guaratee against conflicts.
  5. We can stop using Linq.
yasirkula commented 3 years ago

16 bytes should be OK (12 kb additional memory for 1000 unique logs is acceptable). We can't use fixed variable since it is unsafe. I'd like to know how steps 1 and 2 can be implemented. After updating your implementation with steps 1-2-5, can you profile it and then also profile my solution?

Dinno commented 3 years ago

What do you think about this implementation?

yasirkula commented 3 years ago

It's really impressive, seems like you've used every single MD5 utility function out there to optimize the algorithm. Does this method show any GC allocations in Profiler? What about the performance?

I still think that MD5 unnecessarily complicates things. You choose MD5 to minimize collisions, I think. But if my suggested method's performance is also on a par with MD5 method's performance, don't you think that it is a much simpler approach than the current MD5 approach? To have false positives, 2 log's hashes must be the same, logString and stackTrace's lengths must be the same and 5 characters from each string must be the same. If you think that this is still not enough, we can even store the hash values of logString and stackTrace separately and compare both of them.

Dinno commented 3 years ago

May be it is. I just tried to get rid of problems which you stated. For me, it was simple and good enough in both my initial variant and your's :) So, just choose what you like most.

yasirkula commented 3 years ago

If you get the chance, please profile your latest method and my updated method. If performance-wise they are similar, I'd like to stick with my method. If MD5 is by far more performant, I'll consider it more seriously.