It recently came to my attention that it's possible to make only a single copy of strings, if the assigning context keeps the assigned string around long enough to be copied directly.
This would involve:
Assigning thread puts the string into the thread-local Threaded object's cache and adds a ref to it. It also adds a pointer to this string to the shared storage.
When the Threaded object which cached the string is destroyed, it must copy the string to shared storage before deallocating itself (this ensures that, for example, copying a string from a dead child thread to parent works correctly).
Strings must be cleaned from Threaded local cache in the same way that Threaded, Closure and Socket objects currently are.
This would give us a performance improvement in PM in many cases:
Copying packets from main -> RakLib would now only involve 1 copy, on the RakLib thread - reducing main thread time spent on sending packets
Copying packets from RakLib -> main thread would only involve 1 copy, on the main thread - reducing RakLib time spent on receiving packets
Async compression would involve only 2 copies instead of 4: AsyncTask would have to copy uncompressed from main, and main would have to copy compressed from AsyncTask (instead of both having to copy both ways).
A problem that might nerf the advantage of this is needing to do a bytewise comparison of strings to determine whether the cached version is the same as the local version.
It recently came to my attention that it's possible to make only a single copy of strings, if the assigning context keeps the assigned string around long enough to be copied directly.
This would involve:
This would give us a performance improvement in PM in many cases: