spandex-project / spandex

A platform agnostic tracing library
MIT License
333 stars 53 forks source link

Add "Max Interval" flushing mechanism #45

Closed aspett closed 6 years ago

aspett commented 6 years ago

This addresses the issue @ https://github.com/zachdaniel/spandex/issues/37 by configuring a 'max interval' for traces to stay waiting. After this period of time, they will be sent. This is important, as the datadog agent discards traces received that are older than 10 seconds.

Open to suggestions/changes - I just wanted to get this going for testing with some services locally.

coveralls commented 6 years ago

Coverage Status

Coverage increased (+0.7%) to 69.912% when pulling 8df97db8abbae7705bda8a7e0a8a4754c23ad8dd on aspett:max-interval into d5dd2f5d5c5afd28a76ba1be11f8d5dd7eccb13a on zachdaniel:master.

zachdaniel commented 6 years ago

I'm worried that this won't have the advertised effects when there is high load. I agree that its not really a good thing to only consider sending spans when a new one is received, but in the case where the mailbox for this server is growing, this flush message could actually be received far later than intended.

I think that if your load isn't enough to fill up even a small batch within 10 seconds, then a batch size of 1 should work just fine. However, I believe you mentioned that it is seasonal/sporadic which this wouldn't account for. In order to implement this without causing issues during times of load (or at least it just not working under heavy load), I think you'd want to instead check every so often, say 1 second, for any traces approaching their max interval, and send those (this can be done by looking at the time stamp). At the same time, you'd want to duplicate this logic whenever you are sending spans, so that each message to send a batch also sends any additional batches that are approaching their limit.

It still isn't fully solved by that though, because the server could be so backed up that you just wouldn't have time.

What kind of load are you looking at? We were sending 200 traces per second, each with 5-10 spans with a batch_size of only 2. So if you're sending less data than that, then maybe you don't need this and can just set the batch_size to 1?

aspett commented 6 years ago

I'm worried that this won't have the advertised effects when there is high load. I agree that its not really a good thing to only consider sending spans when a new one is received, but in the case where the mailbox for this server is growing, this flush message could actually be received far later than intended.

Hmm this is a good point. One thing I'd considered doing is actually making a ref, storing the latest one, and sending a message more like {:interval, ref} and checking the ref matches the latest one in case it wasn't current anymore. The more I think about it, that would be a better solution than what this currently is.

I'm mostly solving this problem at the moment for dev/QA environment setups - but looking for it to scale better in the future for (yet to be released) prod. Hence I'd really like to take advantage of the current backpressure mechanism, while still catering for low points.

What do you think of the above idea as opposed to this?

zachdaniel commented 6 years ago

I still think even with sending the ref that you could find yourself just not honoring the contract of "spans will only wait x amount of seconds maximum". Granted, that just isn't really a hard guarantee we could ever make, so a good faith effort to do so isn't necessarily a bad thing. I will say that if you're not planning on releasing your production piece for at least a couple weeks, I intend to have sampling (and really everything on the list I gave you) done by then. I think using a proper sampling strategy would just let you always send spans as soon as they are sampled, without incurring any load penalties.

aspett commented 6 years ago

👍 Sounds good. I think I'm going to make that refactor on my fork nevertheless so it can be used in the meantime. I look forward to seeing how those improvements go & look!