turms-im / turms

🕊️ The world's most advanced open source instant messaging engine for 100K~10M concurrent users https://turms-im.github.io/docs
Apache License 2.0
1.73k stars 267 forks source link

Try to avoid using `String` and `Builder` in proto models #878

Open JamesChenX opened 2 years ago

JamesChenX commented 2 years ago

For user texts:

  1. The client will send bytes via TCP to turms-gateway and then to turms-service
  2. turms-service stores the bytes into ByteBuf first when receiving

  1. then it uses com.google.protobuf.CodedInputStream to decode the ByteBuf to TurmsRequest proto model, and uses com.google.protobuf.CodedInputStream.UnsafeDirectNioDecoder#readStringRequireUtf8 to decode the string data, and then uses com.google.protobuf.Utf8.UnsafeProcessor#decodeUtf8Direct to decode.
  2. it will create a char[], and copy the direct buffer to the heap. copy but it's reasonable.
  3. the char[] will be passed to new String(resultArr, 0, resultPos). So copy again.
  4. then if it's the text of CreateMessageRequest, turms-service will call String#getBytes to write the bytes to direct buffer to flush the Message record to MongoDB. So copy again.
  5. ~~if we need to log the client requests, the logger needs to call String#getBytes to remove non-printable characters of user input texts to ensure it's safe to log them, and then write the final ByteBuffer to append to console or the log file. So copy twice.~~ We don't log the user input texts for security and better performance currently. ...

If we can avoid using String and just use byte[], we can save a lot of memory.

JamesChenX commented 2 years ago

After research, we can write a plugin for protoc to add our own code for proto models:

  1. Introduce a new class ExtendedString that stores the raw UTF8 bytes for MongoDB and UTF16/LATIN1 bytes[] for turms-plugin if they are enabled
  2. Add a public constructor and setters for proto models instead of their builder
  3. Refactor related workflow

But it needs about 5 days to finish and test, we just keep it simple and unchanged currently in v1.0