Closed Dieterbe closed 7 years ago
The message must be drained from the conn before go-nsq can read the next message. Eliminating ioutil.ReadAll
and returning the reader (the conn) must mean there must be one message handler per conn, and it must always be processing.
sync.Pool
would be decent if you have a relatively stable size for incoming responses. If body size varies significantly, half the Get
's from the pool will need resized on reading, resulting in no benefit.
I think a nice approach would be to provide an interface for custom message decoding. This eliminates the need to ReadAll
in the first place by decoding directly into what you'll be using the body for.
If body size varies significantly, half the Get's from the pool will need resized on reading, resulting in no benefit.
remember that you could put the bigger buffers back into the pool for reuse. so if you have high sizing variety it just means you'll be using more memory (until the next GC run) and there will definitely be benefit in terms of allocations and amount of GC runs.
i like your idea of having a special api for it though, and leave allocations up to the user of the library!
Thanks for looking into this @Dieterbe and apologies for the delayed response.
go-nsq
was designed around the idea of making it really easy to get started and not having to write too much boilerplate for consumers. We haven't spent much time at all optimizing it (compared to e.g. nsqd
).
I'm guessing there are lots of opportunities like this throughout the library.
With respect to this specific issue, why don't we just experiment with some of the proposed solutions and see how far we get? I would love some help contributing patches for these!
I just tried out this patch:
diff --git a/message.go b/message.go
index a5a7fa4..67ba41d 100644
--- a/message.go
+++ b/message.go
@@ -1,10 +1,9 @@
package nsq
import (
- "bytes"
"encoding/binary"
+ "errors"
"io"
- "io/ioutil"
"sync/atomic"
"time"
)
@@ -139,24 +138,19 @@ func (m *Message) WriteTo(w io.Writer) (int64, error) {
return total, nil
}
-// DecodeMessage deseralizes data (as []byte) and creates a new Message
+// DecodeMessage deserializes data (as []byte) and creates a new Message
func DecodeMessage(b []byte) (*Message, error) {
- var msg Message
+ var msg Message // DIETER 0.1% of alloc in this fn
- msg.Timestamp = int64(binary.BigEndian.Uint64(b[:8]))
- msg.Attempts = binary.BigEndian.Uint16(b[8:10])
-
- buf := bytes.NewBuffer(b[10:])
-
- _, err := io.ReadFull(buf, msg.ID[:])
- if err != nil {
- return nil, err
+ if len(b) < 11+MsgIDLength {
+ return nil, errors.New("not enough data to decode valid message")
}
- msg.Body, err = ioutil.ReadAll(buf)
- if err != nil {
- return nil, err
- }
+ msg.Timestamp = int64(binary.BigEndian.Uint64(b[:8]))
+ msg.Attempts = binary.BigEndian.Uint16(b[8:10])
+ //msg.ID = MessageID(b[10 : 10+MsgIDLength][:])
+ copy(msg.ID[:], b[10:10+MsgIDLength])
+ msg.Body = b[10+MsgIDLength:]
return &msg, nil
}
i haven't had much time to work on this yet or to look at the other spots, but this looked like low hanging fruit. in screenshot below is 2 runs of our service with the same input load coming from NSQ. first half of the graphs is with patch applied, second half is without the patch. note the difference in how fast "alloc not freed" rises and the resulting difference in frequency of GC runs i'm also seeing lower cpu usage with the patch applied.
@Dieterbe those results look promising and the diff looks simple and safe to me, want to open up a PR so we can get that in?
(we can leave this issue open to track overall optimization progress)
here's a profile from a production service (if you're interested, the code lives at https://github.com/raintank/raintank-metric/tree/master/metric_tank)
we can probably address the first and 3rd item pretty easily by using a sync.Pool the first one comes from
msg.Body, err = ioutil.ReadAll(buf)
in go-nsq.DecodeMessage