psincraian / pepy

pepy is a site to get statistics information about any Python package.
https://pepy.tech
MIT License
798 stars 33 forks source link

API endpoint usage #573

Closed chris48s closed 12 months ago

chris48s commented 1 year ago

Hello. There's a thread over on the shields.io repo about adding a PyPI Total Downloads badge using pepy as the source

https://github.com/badges/shields/issues/4319

Before taking that conversation further, I wanted to open an issue to discuss because:

Are you able to give us an initial indication of whether you'd be happy with us adding this? Cheers

psincraian commented 1 year ago

Hey @chris48s,

I'm open to collaborating on the integration and am here to help ☺️

For context, we currently handle 5k to 7k requests per hour. I noticed from your issue that you're redirecting 8k requests hourly to pypistats, which is over double our current volume.

Before proceeding, I need to assess our server's capacity. Though it's feasible to expand pepy's capacity, I'd prefer not to due to potential cost increases. I'll assess this once I'm back from vacation.

Could you please clarify a few things:

  1. Do you experience any peak traffic times we should be aware of?
  2. Is it possible for me to introduce an API key for shields.io?
  3. Are you primarily interested in summary stats (total, monthly, weekly)? If so, I could set up a dedicated endpoint to reduce the database load.
  4. Keep in mind, Pepy is provided as a best effort service. Would any downtime be a significant issue for you?

Thanks for your cooperation ☺️

chris48s commented 1 year ago

Hi. Just acknowledging I've seen your post but I haven't had a chance to reply yet. I'm aiming to reply with answers in the next couple of days. Cheers

chris48s commented 1 year ago

For context, we currently handle 5k to 7k requests per hour. I noticed from your issue that you're redirecting 8k requests hourly to pypistats, which is over double our current volume. Before proceeding, I need to assess our server's capacity. Though it's feasible to expand pepy's capacity, I'd prefer not to due to potential cost increases. I'll assess this once I'm back from vacation.

I wouldn't expect us to immediately send that kind of traffic your way. We've reached that level of usage with pypistats gradually over many years of carrying the day/week/monthly badges. I wouldn't expect to add a total downloads badge and immediately have that level of users. On day one, the traffic will be close to zero. PyPI badges are some of the most popular services on shields.io though.

In terms of keeping usage down, the main thing we can do is cache the badges downstream at the CDN. This means that badges embedded in the README of a popular project are only requested periodically. They mostly get served from cache. Our default max-age for a downloads badge is 20 mins. Given you are only updating the data once per day, I'd suggest we should set a much longer max-age for pepy. That should keep the traffic lower. Side note: They've never complained about it, but thinking this through and writing this up has made me realise pypistats are also only updating daily and we haven't customised the default :grimacing: , so I am going to submit a PR which will also reduce the amount of traffic we're sending their way.

Do you experience any peak traffic times we should be aware of?

Our demand curve is pretty predictable. We serve most traffic during working hours for Europe and North America and least when it is daylight over the Pacific Ocean. We also see a dip on the weekends. We scale our own infra based on scheduled events rather than in response to traffic.

Is it possible for me to introduce an API key for shields.io?

Short answer: Yes. Slightly longer answer:

Are you primarily interested in summary stats (total, monthly, weekly)? If so, I could set up a dedicated endpoint to reduce the database load.

I think the only number we would want from pepy is total_downloads. If you wanted to set up a more efficient endpoint that only returns that, that would be cool.

Keep in mind, Pepy is provided as a best effort service. Would any downtime be a significant issue for you?

In general we try to avoid adding badges for services which we know to be unreliable. It provides a poor experience for users and generates support requests for us. That said, there isn't like a minimum uptime threshold or anything. If you're regularly experiencing a lot of downtime, I'd be hesitant to add this. If you just do your best but don't provide an SLA, that's fine. Shields is also a volunteer run service.

hugovk commented 1 year ago

One important thing to note -- is it still the case that PePy includes downloads from all sources? That is, from PyPI and from all mirrors (such as bandersnatch, z3c.pypimirror, Artifactory, and devpi)?

For example, see https://github.com/psincraian/pepy/issues/164 where people have noticed the PePy numbers are much inflated compared with pypistats, for which most endpoints are without mirrors (and one endpoint includes both with and without). See their FAQ.

PS Thank you both for all your work on PePy and Shields.io, they're both excellent tools! :clap:

chris48s commented 1 year ago

This point about including/excluding mirrors is noted in https://github.com/badges/shields/issues/4319#issuecomment-1682919057

chris48s commented 1 year ago

I've added an additional note on it to https://github.com/badges/shields/issues/4319#issuecomment-1697996155

psincraian commented 1 year ago

Let me try answer from the phone:

  1. Ok, so then this will apply to only new badges. I think it will be much easier for me to predict the traffic and see if the service is struggling.

  2. Perfect 👍 similar to what I observed then.

  3. Mainly, my idea is to do rate limiting. I would rather not have unknown traffic overloading the service. I can put some higher limit for shields, like 10x of the current traffic.

Likewise, I can still have the endpoint public but with a lot lower threshold, like 1 request per second. Will this make it easier for your CI?

  1. Perfect. Given on what you said, I think you can rely on the current endpoint and if I add a new one I can raise a pull request on your project ☺️

  2. Understood 👍 I think we are aligned on terms of SLA. Our SLA for the last year has been >99%,

psincraian commented 1 year ago

@hugovk I know that people is interested into downloads by installer, but with the survey that I did in June it's not the top priority for the persons who answered it.

I will focus on what the most people is interested in, more historical data, and then implement this probably ☺️

chris48s commented 1 year ago

Hello.

We implemented these badges a few months back.

At the moment our usage is still very low.

We recently started getting 401 Unauthorized responses {"message":"Invalid API Key"} (reported in https://github.com/badges/shields/issues/9730 ) so I guess you've implemented API keys. How could we get one?

psincraian commented 1 year ago

Hey @chris48s

Sorry, I thought this wasn't done yet. You only need to

Let me know if you have any questions or problems ☺️

If not I will close this issue

chris48s commented 12 months ago

Thanks. Sorry. I should have closed this after we merged https://github.com/badges/shields/pull/9564

I've created an account, made some keys, and checked they work. I won't have a chance to make the code updates until the weekend but it should be straightforward.

I'll close this now. Cheers.