grammyjs / grammY

The Telegram Bot Framework.
https://grammy.dev
MIT License
2.39k stars 118 forks source link

Slower response time compared to telegraf #200

Closed aradalvand closed 2 years ago

aradalvand commented 2 years ago

Hi there! Firstly, I'd like to thank you for your remarkable work! I'm pretty new to the world of Telegram bots and before stumbling into grammY, I had been playing with telegraf for a few days and the lack of documentation and so on was really starting to drive me insane, but then I was absolutely delighted to come across grammY! I've found it to be a much nicer and overall far superior framework to both of the current big names, namely telegraf and node-telegram-bot-api. I think it's criminally underrated at the moment, I sincerely hope it gets more and more popular over time and hopefully at some point overtakes the big two :) Keep up the great work!

Okay, now regarding the issue I've been facing: The only thing I've been slightly disappointed about in grammY has been the noticeably slower response time I consistently experience compared to telegraf (and also node-telegram-bot-api, for that matter). I'm not sure why this happens though, it's strange. I've created a dead-simple repro (you can find it in this repo), you can just run either npm run grammy or npm run telegraf and go to http://t.me/telegraf_vs_grammy_bot to take a look at the result.

This is grammY:

https://user-images.githubusercontent.com/26527405/168333909-09eb2f9d-19d7-44a0-9ec5-6e5e832ac423.mp4

This doesn't look terribly slow if you look at it alone, but it's visibly (much) slower than telegraf — see below — and based on my experience this slowness gets more and more noticeable as the bot grows in complexity.

Now, here's telegraf:

https://user-images.githubusercontent.com/26527405/168334591-4da72297-f13f-4bbe-b59c-47fac59b7035.mp4

I think the difference is clear :)

Note that this is all under the same network connection and everything (For what it's worth, I also have a VPN enabled as I'm living in Iran and the government has unfortunately banned Telegram here.)

Not sure why this is happening, as it seems to me that there shouldn't be a fundamental difference between how grammY and telegraf (or ntba) retrieve updates (this is the default polling technique btw, I've also tried grammY runner — included in the repo — but it doesn't improve the performance in this particular case at all).

Thank you in advance.

KnorpelSenf commented 2 years ago

Hi and welcome! Thanks for the kind words, glad you like it :)

I don't have access to a VPS in Iran so this is hard to reproduce. There is no speed difference for me in various data centres across Europe.

We've had a similar discussion in #189 not long ago. Perhaps @hayyaun knows any advice?

It's peculiar that there can be a difference in response time. Both libraries use the identical HTTP client library under the hood, so they're fundamentally identical when it comes to how requests are performed.

I know also that grammY generates requests which are identical to the byte level for most requests when compared to curl.

In a few days I can take some time to inspect more thoroughly if there are any differences between the libraries. However, I have little confidence that this will yield any results, since that's not how the library behaves for me.

Note also that grammY is powering large bots with tens of millions of requests every day (several hundred per second) and there are no issues with responsiveness.

KnorpelSenf commented 2 years ago

Just for testing purposes, can you write a tiny program that performs only a getMe call?

It would be interesting to see this

And please measure how much time it takes to perform this call. Let me know if you don't know how to do any of this.

hayyaun commented 2 years ago

Hi there! Firstly, I'd like to thank you for your remarkable work! I'm pretty new to the world of Telegram bots and before stumbling into grammY, I had been playing with telegraf for a few days and the lack of documentation and so on was really starting to drive me insane, but then I was absolutely delighted to come across grammY! I've found it to be a much nicer and overall far superior framework to both of the current big names, namely telegraf and node-telegram-bot-api. I think it's criminally underrated at the moment, I sincerely hope it gets more and more popular over time and hopefully at some point overtakes the big two :) Keep up the great work!

Okay, now regarding the issue I've been facing: The only thing I've been slightly disappointed about in grammY has been the noticeably slower response time I consistently experience compared to telegraf (and also node-telegram-bot-api, for that matter). I'm not sure why this happens though, it's strange. I've created a dead-simple repro (you can find it in this repo), you can just run either npm run grammy or npm run telegraf and go to http://t.me/telegraf_vs_grammy_bot to take a look at the result.

This is grammY:

grammy.screen.2.mp4 This doesn't look terribly slow if you look at it alone, but it's visibly (much) slower than telegraf — see below — and based on my experience this slowness gets more and more noticeable as the bot grows in complexity.

Now, here's telegraf:

telegraf.screen.mp4 I think the difference is clear :)

Note that this is all under the same network connection and everything (For what it's worth, I also have a VPN enabled as I'm living in Iran and the government has unfortunately banned Telegram here.)

Not sure why this is happening, as it seems to me that there shouldn't be a fundamental difference between how grammY and telegraf (or ntba) retrieve updates (this is the default polling technique btw, I've also tried grammY runner — included in the repo — but it doesn't improve the performance in this particular case at all).

Thank you in advance.

Hi Arad I had same issue on my local device, but in production mode on one of ParsPack servers the issue seems to be disappeared and it was so efficient and fast. Don't worry about it and enjoy the library. If you had any questions I'm available for you to help on social media with my id hayyaun.

MKRhere commented 2 years ago

@AradAlvand have you tried running the Deno version of grammY? @hayyaun If you're able to, could you also test if the slow response times persists on the Deno version?

hayyaun commented 2 years ago

Roger that

aradalvand commented 2 years ago

I don't have access to a VPS in Iran so this is hard to reproduce. There is no speed difference for me in various data centres across Europe.

@KnorpelSenf That's good to know, although it makes it even more weird, because I'm also connected to some server in Europe via my VPN so I should theoretically experience the same kind of performance across both telegraf and grammY:

image

The fact that the two frameworks yield different kinds of performance is baffling.

aradalvand commented 2 years ago

Hi Arad I had same issue on my local device, but in production mode on one of ParsPack servers the issue seems to be disappeared and it was so efficient and fast. Don't worry about it and enjoy the library. If you had any questions I'm available for you to help on social media with my id hayyaun.

@hayyaun Hi Hayyan! That's very good to know, thank you! That's a relief! If this problem only exists on our local machines then there's not really much to worry about, but it's still very strange that there's such a problem in the first place, it also makes the developer experience subpar, so I would say it's worth exploring further.

Are your ParsPack servers located in Europe? Because as far I know, ParsPack offers servers both in Iran and also in various parts of Europe.

aradalvand commented 2 years ago

Just for testing purposes, can you write a tiny program that performs only a getMe call? It would be interesting to see this written with Telegraf via bot.telegram.getMe written with grammY via bot.api.getMe written with the npm package called node-fetch using curl And please measure how much time it takes to perform this call. Let me know if you don't know how to do any of this.

@KnorpelSenf Okay, I just created a directory called getMe in the same repo containing 4 new scripts sending getMe requests in each of the ways you mentioned, using performance.now() for measuring the time elapsed. You can check it out and let me know if it's what you wanted — you can run npm run [name-of-the-file-in-getMe-directory] to run each script.

There didn't seem to me to be much of a difference between telegraf and grammY in terms of getMe performance, both around ~500-600 milliseconds, so the issue must lie elsewhere. FWIW, The curl approach seems to be the slowest of them all, I'm not sure why.

KnorpelSenf commented 2 years ago

The fact that the two frameworks yield different kinds of performance is baffling.

I agree. I have zero intuition what could cause this. Neither do the Telegraf maintainers, to whom I talked about this.

There didn't seem to me to be much difference between telegraf and grammY in terms of getMe performance

Your test results seem to indicate that it's not a problem of the networking implementation, but rather that there's different configuration that is used. Maybe it's about Telegraf reusing the same connection for multiple subsequent calls, but grammY somehow fails to do that, even though it should?

It would be interesting if you could find the fetch method in the node-fetch package and print all arguments of the fetch call before executing them. The getMe scripts should work for this again.

KnorpelSenf commented 2 years ago

curl being slowest is an illusion, it's caused because Node.js takes a lot of time creating a new process. If you switched to using time curl -X ..... instead, it would likely be similarly fast than the other scripts, if not slightly faster.

KnorpelSenf commented 2 years ago

Can you try out the changes in https://github.com/grammyjs/grammY/pull/212 and see if it fixes the problem?

If you don't know how to clone and build the repository and link the branch into your project, you can just do:

npm install grammyjs/grammY#longer-keepalive-times

in your project. This will automatically download the PR and build it into your node_modules folder (so it can take some time).

Please let me know if it has any effect. This is just guesswork.

aradalvand commented 2 years ago

Hi @KnorpelSenf! I just tested with grammyjs/grammY#longer-keepalive-times and unfortunately I'm not seeing any noticeable difference in terms of response time.

But I very much appreciate your efforts to fix this <3

KnorpelSenf commented 2 years ago

That is interesting. I pushed new changes which also unify the URL usage, even though it does not make much sense why that should be the issue.

Can you reinstall the dependency from the branch (delete node_modules and npm install) and try again?

aradalvand commented 2 years ago

Still the same sadly: Here's telegraf: telegraf

And here's grammY: grammy

image

KnorpelSenf commented 2 years ago

Odd stuff. There is no known difference between grammY and Telegraf anymore, so it is unexpected that there still is an observable time difference.

Can you try to run these three scripts using time and with the same versions as in the image? Please run

export BOT_TOKEN=.....

with your bot token before.

grammY test script

// file: grammy.js
const { Api } = require('grammy')

const api = new Api(process.env.BOT_TOKEN)

api.getMe().then(console.log.bind(console))

Run it using

time node grammy.js

Telegraf test script

// file: telegraf.js
const { Telegram } = require('telegraf')

const api = new Telegram(process.env.BOT_TOKEN)

api.getMe().then(console.log.bind(console))

Run it using

time node telegraf.js

bash test script

time curl https://api.telegram.org/bot$BOT_TOKEN/getMe
EdJoPaTo commented 2 years ago

Using either grammy or grammyjs/grammY#longer-keepalive-times does not seem to have a significant impact for me

hyperfine 'node telegraf.js' 'node grammy.js' 'curl https://api.telegram.org/bot$BOT_TOKEN/getMe' --runs=50

Locally (over WiFi):

Benchmark 1: node telegraf.js
  Time (mean ± σ):     186.7 ms ±  21.9 ms    [User: 109.9 ms, System: 17.8 ms]
  Range (min … max):   157.0 ms … 246.1 ms    50 runs

Benchmark 2: node grammy.js
  Time (mean ± σ):     181.4 ms ±  15.2 ms    [User: 107.6 ms, System: 17.8 ms]
  Range (min … max):   149.2 ms … 240.2 ms    50 runs

Benchmark 3: curl https://api.telegram.org/bot$BOT_TOKEN/getMe
  Time (mean ± σ):      85.2 ms ±  19.2 ms    [User: 13.4 ms, System: 7.8 ms]
  Range (min … max):    55.0 ms … 145.2 ms    50 runs

Summary
  'curl https://api.telegram.org/bot$BOT_TOKEN/getMe' ran
    2.13 ± 0.51 times faster than 'node grammy.js'
    2.19 ± 0.56 times faster than 'node telegraf.js'

On one of my servers it looks similar:

Benchmark 1: node telegraf.js
  Time (mean ± σ):     174.7 ms ±  31.5 ms    [User: 103.4 ms, System: 18.6 ms]
  Range (min … max):   140.1 ms … 250.1 ms    50 runs

Benchmark 2: node grammy.js
  Time (mean ± σ):     169.4 ms ±  33.3 ms    [User: 102.6 ms, System: 15.3 ms]
  Range (min … max):   132.9 ms … 292.1 ms    50 runs

Benchmark 3: curl https://api.telegram.org/bot$BOT_TOKEN/getMe
  Time (mean ± σ):      73.8 ms ±  27.4 ms    [User: 10.8 ms, System: 3.4 ms]
  Range (min … max):    44.5 ms … 148.6 ms    50 runs

Summary
  'curl https://api.telegram.org/bot$BOT_TOKEN/getMe' ran
    2.30 ± 0.97 times faster than 'node grammy.js'
    2.37 ± 0.98 times faster than 'node telegraf.js'
KnorpelSenf commented 2 years ago

Those tests are irrelevant. This issue can only be reproduced on local machines in Iran. The performance is comparable for both frameworks if your machine is located in a different country. The performance is also comparable for both frameworks if you are using a VPS in Iran.

aradalvand commented 2 years ago

@KnorpelSenf Hi again! Sorry for the delay, I've been a little busy :)

Results: telegraf:

$ time node telegrafSimpleGetMe.js
{
  id: 5336333594,
  is_bot: true,
  first_name: 'Telegraf vs grammY',
  username: 'telegraf_vs_grammy_bot',
  can_join_groups: true,
  can_read_all_group_messages: false,
  supports_inline_queries: false
}

real    0m0.806s
user    0m0.030s
sys     0m0.000s

grammY:

$ time node grammySimpleGetMe.js
{
  id: 5336333594,
  is_bot: true,
  first_name: 'Telegraf vs grammY',
  username: 'telegraf_vs_grammy_bot',
  can_join_groups: true,
  can_read_all_group_messages: false,
  supports_inline_queries: false
}

real    0m0.982s
user    0m0.000s
sys     0m0.030s

curl:

$ time curl https://api.telegram.org/botTOKEN/getMe
{"ok":true,"result":{"id":5336333594,"is_bot":true,"first_name":"Telegraf vs grammY","username":"telegraf_vs_grammy_bot","can_join_groups":true,"can_read_all_group_messages":false,"supports_inline_queries":false}}
real    0m0.599s
user    0m0.015s
sys     0m0.015s

Let me know if I've done anything wrong.

KnorpelSenf commented 2 years ago

Thanks. The calls are ~15 % off. That's within the fluctuation I'd expect in these networking conditions. The test does not reproduce the issue. This means there's something more complicated going on.

There are a number of ways how to investigate this, but I think it's a fairly inefficient way if I have to write everything into this issue. Someone with access to the machine should try to track this down. So either you'll try to work it out yourself, or you'll grant someone access to your machine. I myself won't be able to spend the time on this in the coming weeks, but if you don't manage to figure it out, I can come back to you here once I have time for that again.

Another member of the grammY community pointed out that these things can happen randomly in Iran, and that there's a good chance the problem isn't actually in grammY. It will be interesting to know. As @EdJoPaTo demonstrated, the libraries behave otherwise identically.

aradalvand commented 2 years ago

This is indeed a very peculiar problem and I have literally zero clue as to what could be causing it, and of course the fact that it's not easily reproducible for you guys means debugging it would be a total pain. So, I understand what you're saying, thanks a lot for your efforts so far :)

I'm not worrying too much about it at the moment though to be honest.

Because as @hayyaun said in this comment this problem apparently disappears when the bot's running on a VPS outside of Iran, for instance, which means in production the users aren't going to suffer from it, so it's not that big of a deal ultimately if that's the case. I can definitely live with the slower response times during development, as long as it's not an issue هد production :)

KnorpelSenf commented 2 years ago

Alright, that's good to hear. Debugging it will likely take several hours, I'd have set aside a day. Maybe it's the easiest option to just not investigate it.

Should we close this issue?

KnorpelSenf commented 2 years ago

Please drop a message if you discover anything new :)

And good luck with your project!

aradalvand commented 2 years ago

Sure, definitely :) Thank you!

skyisboss commented 2 years ago

My location is in the Philippines and i reproduce the same issues,then I tried running the same code with a friend who lives in Singapore, but unfortunately the situation didn't change...😥

KnorpelSenf commented 2 years ago

@skyisboss how high is your load? Can you share more about the setup? This issue discusses a variety of environments

aradalvand commented 2 years ago

Just for the record, I did a little test on a VPS (located in Germany), and it confirmed what @hayyaun alluded to.

Here's when the bot's running on my local machine (with a VPN enabled, of course):

https://user-images.githubusercontent.com/26527405/172062787-27b57c55-2d64-4327-99e0-6a4486be6c65.mp4

And here's when it's running on the VPS:

https://user-images.githubusercontent.com/26527405/172062802-3d030f6e-6a3b-4add-a8b1-c56280e61016.mp4

As is clear from the video above, the delay disappears on the VPS and the performance is perfectly optimal.

KnorpelSenf commented 1 year ago

This was fixed in 1.16.2. See #433 if you are curious about the fix.