psincraian / pepy

pepy is a site to get statistics information about any Python package.
https://pepy.tech
MIT License
781 stars 33 forks source link

Get badge without mirrors? #164

Open shadiakiki1986 opened 4 years ago

shadiakiki1986 commented 4 years ago

Hey there. Awesome project. Is it possible to get a badge from pepy without the mirrors? For my project, the mirrors stats are much larger than the non-mirror ones because it's still a young project. I wouldn't want to be misleading with the badge on my README

References

https://pepy.tech/project/isitfit

https://pypistats.org/packages/isitfit

psincraian commented 4 years ago

I don't plan to add this feature, maybe I will add a similar one to list the source of downloads. If you take a look at your downloads in BigQuery you can see the following results:

row details_installer_name downloads
1 Browser 122
2 pip 71
3 requests 96
4 null 48
5 bandersnatch 2886

As you can see here the bandersnatch mirror has 2866 downloads. It seems quite a lot, but the mirror can be installed locally link. So if I have a mirror locally and I install your packaging from the mirror, this download will not be took into account.

shadiakiki1986 commented 4 years ago

What I normally do is filter for only pypi in my package query

On Wed, Oct 9, 2019, 21:01 Petru Rares Sincraian notifications@github.com wrote:

I don't plan to add this feature, maybe I will add a similar one to list the source of downloads. If you take a look at your downloads in BigQuery you can see the following results: row details_installer_name downloads 1 Browser 122 2 pip 71 3 requests 96 4 null 48 5 bandersnatch 2886

As you can see here the bandersnatch mirror has 2866 downloads. It seems quite a lot, but the mirror can be installed locally link https://bandersnatch.readthedocs.io/en/latest/mirror_configuration.html. So if I have a mirror locally and I install your packaging from the mirror, this download will not be took into account.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/psincraian/pepy/issues/164?email_source=notifications&email_token=ACAA5BA2BOI47RYZNY6L5V3QNYL6TA5CNFSM4I6QXUR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYYZBI#issuecomment-540118149, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAA5BDCBW6GOC4JRULMEPTQNYL6TANCNFSM4I6QXURQ .

shadiakiki1986 commented 4 years ago

Filter for pip* (typo)

On Wed, Oct 9, 2019, 22:05 shadi akiki shadiakiki1986@gmail.com wrote:

What I normally do is filter for only pypi in my package query

On Wed, Oct 9, 2019, 21:01 Petru Rares Sincraian notifications@github.com wrote:

I don't plan to add this feature, maybe I will add a similar one to list the source of downloads. If you take a look at your downloads in BigQuery you can see the following results: row details_installer_name downloads 1 Browser 122 2 pip 71 3 requests 96 4 null 48 5 bandersnatch 2886

As you can see here the bandersnatch mirror has 2866 downloads. It seems quite a lot, but the mirror can be installed locally link https://bandersnatch.readthedocs.io/en/latest/mirror_configuration.html. So if I have a mirror locally and I install your packaging from the mirror, this download will not be took into account.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/psincraian/pepy/issues/164?email_source=notifications&email_token=ACAA5BA2BOI47RYZNY6L5V3QNYL6TA5CNFSM4I6QXUR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYYZBI#issuecomment-540118149, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAA5BDCBW6GOC4JRULMEPTQNYL6TANCNFSM4I6QXURQ .

jewettaij commented 4 years ago

For what it's worth, I can also chime in to confirm that my download stats on pepy.tech are much, much higher than they are (or were) on pypistats.org.

That's all I wanted to say. No need to reply to this post.

The remainder of this message is a discussion which is not directly relevant to this thread, but I wasn't sure if it was appropriate to start a new issue. Feel free to ignore.

Other possible reasons for high download counts

Estimating the number of users instead of download counts

Before I used pypi, when my software was hosted on my own web page, the majority of downloads came from the same few IP addresses. (For example, I remember that one IP address downloaded my software over 10000 times. This was back when it was legal to keep track of visitor IP addresses.) Is it possible to use BigQuery to estimate the number of unique users (by discarding downloads from the same IP)? (Forgive me. I know nothing about BigQuery.)

Excluding downloads with unknown python versions

When I used pypistats.org, it was able to show what version of python the users who downloaded my project were using (eg 2.7, 3.5, 3.7, etc...). This was interesting, but it's not essential. I only mention this here because it seemed that (even after excluding downloads from mirrors), the majority of downloads for my small project were from users whose python version is "unknown" and whose OS is also "unknown". Are these downloads legitimate? Should we exclude them?

Thanks for creating this service.

laurahanu commented 3 years ago

I also think it would be useful to have the option to choose the type of stats! Have there been any updates on this or are there any plans to add this in the future?

psincraian commented 3 years ago

Hi @laurahanu, currently we are saving download stats without mirrors. Now we need to make changes to the API and to the frontend app :-)

laurahanu commented 3 years ago

Hi @psincraian, thanks for the reply and good to hear! Looking forward!

PMeira commented 3 years ago

currently we are saving download stats without mirrors. Now we need to make changes to the API and to the frontend app :-)

@psincraian I'm also looking forward to that, and thanks PePy as a whole!

I just wanted to add some thoughts on this, hopefully not too off-topic. I know these are not trivial issues and I' m aware of the discussion on why PyPI doesn't include stats themselves. And I imagine these issues don't matter much for packages with a large number of downloads.

As you can see here the bandersnatch mirror has 2866 downloads. It seems quite a lot, but the mirror can be installed locally link. So if I have a mirror locally and I install your packaging from the mirror, this download will not be took into account.

Since the mirrors seem to download all files, they might inflate a lot the numbers for packages with few users but binary wheels for various Python versions and platforms. I believe the total without mirrors will help a lot in those cases.

For instance, using BigQuery directly* a few weeks ago, one of my packages had:

(=I was using the old `downloadstable for this, notfile_downloads`)

@jewettaij mentioned:

the majority of downloads for my small project were from users whose python version is "unknown" and whose OS is also "unknown". Are these downloads legitimate? Should we exclude them?

Besides those, which usually reflect that the fields are null in the BigQuery table, I noticed some other weird things. For example, I'm not sure how the "country_code" is filled in the BigQuery data, even when restricted to "pip" as the installer. For my niche package, I noticed from the data that country_code=US is disproportionally larger than everything else, so I wonder:

laurahanu commented 3 years ago

Hi @psincraian, have there been any updates with the api or on the front end side? Otherwise, is there a timeline for when this would be included?