knative-extensions / eventing-natss

NATS streaming integration with Knative Eventing.
Apache License 2.0
40 stars 41 forks source link

Upgrade to latest JetStream SDK #542

Open dan-j opened 7 months ago

dan-j commented 7 months ago

Problem We're using a legacy SDK for interacting with NATS, I have been observing some performance issues in our environment, and although I'm not 100% certain the issue is in this repo, things got slightly better after restarting the dispatcher. I believe it's still to do with the number of goroutines we have running, and the new SDK makes it much easier to control how many messages we fetch in parallel.

I'm proposing this as a feature rather than a bug, because as I said, I'm not actually 100% certain there is a performance issue.

I already have some WIP changes which use the new SDK whilst fixing some other minor issues, any PR is probably going to come addressing multiple issues (sorry! but doing all the things I want to do in separate PRs is going to take too long and I'm pushed for time in this repo)

Persona:

Exit Criteria Library is updated and performance is improved.

Time Estimate (optional): 1 day

Additional context (optional) My WIP code makes changes so that we use the new FetchNoWait(batchSize) method, and convert the jetstream.Msg to a new struct which implements binding.Message (since the CE SDK doesn't support this library yet). This struct also contains a Context() method which is configured with a timeout based on AckWait so that we can successfully cancel requests as soon as they expire.

We then write this Message implementation to a local channel which we handle in a fixed number of goroutines, rather than spawning a new goroutine for every message.

I think this will make things much easier to tweak for performance issues, and should be more performant out of the box.

dan-j commented 7 months ago

We then write this Message implementation to a local channel which we handle in a fixed number of goroutines, rather than spawning a new goroutine for every message.

Still wondering whether this is the right approach... The request volume I'm seeing in my environment shouldn't have any impact on performance with respect to the number of goroutines

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

astelmashenko commented 4 months ago

/remove-lifecycle stale

github-actions[bot] commented 4 weeks ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.