Closed raamana closed 6 years ago
It's all downloads logged by PyPI. By default, that also includes mirrors, other clients or very old pip. By far the biggest will be mirrors, you can check with installer
.
See https://github.com/ofek/pypinfo/issues/4.
It would be good to add something about this to the README.
I'd really like to have --pip
be the default behavior in the next release with a new --all
to match the current default behavior.
@hugovk Are you fine with that?
Thanks for the info.
May I also suggest changing the default time window to everything, and not filter it within the last 30 days? Most devs (of not-crazily-popular packages) care about that, I think.
I am also running into errors when specifying time-window, throwing a quota error. When I run it without any time windows, it does return some results without error, making me think its not a issue of quotas. What's going on?
Also, what is Browser
here, which is appearing as an installer? Count of tarball download over the browser?
$ 10:46:52 miner ~ >> pypinfo pyradigm installer
Served from cache: True
Data processed: 0.00 B
Data billed: 0.00 B
Estimated cost: $0.00
| installer_name | download_count |
| -------------- | -------------- |
| bandersnatch | 371 |
| Browser | 14 |
| requests | 4 |
| pip | 3 |
$ 10:47:01 miner ~ >> pypinfo -d 999 pyradigm installer
Traceback (most recent call last):
File "/home/praamana/anaconda2/envs/py36/bin/pypinfo", line 11, in <module>
sys.exit(pypinfo())
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/click/core.py", line 1043, in invoke
return Command.invoke(self, ctx)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/pypinfo/cli.py", line 106, in pypinfo
query_rows = query_job.result(timeout=timeout // 1000)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/cloud/bigquery/job.py", line 2344, in result
super(QueryJob, self).result(timeout=timeout)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/cloud/bigquery/job.py", line 640, in result
return super(_AsyncJob, self).result(timeout=timeout)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/api_core/future/polling.py", line 115, in result
self._blocking_poll(timeout=timeout)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/cloud/bigquery/job.py", line 2318, in _blocking_poll
super(QueryJob, self)._blocking_poll(timeout=timeout)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/api_core/future/polling.py", line 94, in _blocking_poll
retry_(self._done_or_raise)()
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
on_error=on_error,
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/api_core/retry.py", line 177, in retry_target
return target()
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/api_core/future/polling.py", line 73, in _done_or_raise
if not self.done():
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/cloud/bigquery/job.py", line 2306, in done
location=self.location)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 556, in _get_query_results
retry, method='GET', path=path, query_params=extra_params)
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 311, in _call_api
return call()
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
on_error=on_error,
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/api_core/retry.py", line 177, in retry_target
return target()
File "/home/praamana/anaconda2/envs/py36/lib/python3.6/site-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://www.googleapis.com/bigquery/v2/projects/pypistatsraamana/queries/90b09b2d-f62e-4f76-adbc-8c6abe201ee9?maxResults=0&timeoutMs=10000: Quota exceeded: Your project exceeded quota for free query bytes scanned. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
$ 10:47:20 miner ~ >>
@raamana I will definitely not make that the default as that takes a while to execute and those queries may incur unexpectedly large costs (as your example shows :slightly_smiling_face:). You'll have to wait until you get more free quote (a few days perhaps) or you enable billing.
I did enable billing.. web interface says "You have $384.24 in credit and 362 days left of your free trial. ", so not sure why I get errors.
Does that mean the option "-d 999" will cost me more than $380?
Yeah this is a common issue afaik. Credits still implies free tier, inheriting the quotes. You'll need to contact support unfortunately.
I'd really like to have
--pip
be the default behavior in the next release with a new--all
to match the current default behavior.
Yes, I'm fine with that, I always use it with --pip
anyway.
thanks @ofek.. they started working again (seems like my fiddling of the billing account worked)
can you point me to more details on what the different installers are? I think None
refers to older versions of pip (as noted in #4, which are legitimate installs right?) and am curious about Browser
and requests
installers, which do have a high count for my packages.. thanks.
$ 11:21:38 miner ~ >> pypinfo -sd -400 neuropredict installer
Served from cache: False
Data processed: 142.76 GiB
Data billed: 142.76 GiB
Estimated cost: $0.70
| installer_name | download_count |
| -------------- | -------------- |
| bandersnatch | 13,971 |
| requests | 294 |
| Browser | 287 |
| pip | 104 |
| None | 69 |
| z3c.pypimirror | 43 |
| pep381client | 28 |
| Artifactory | 7 |
bandersnatch
- mirroring toolBrowser
- downloads by clicking on files here https://pypi.org/project/pypinfo/#filesrequests
- downloading the aforementioned files using https://github.com/requests/requeststhanks Ofek, that's what I thought, just wanted to double check.
I simply can't imagine why my target user (not professional software developers), esp. over 200 of them, would use requests
to get the package and manually install it, when pip install
is so much easier? Assuming the clones performed the CI servers are counted under pip
, the counts for Browser
and requests
seem very high to me. Barring excessive repeat/duplicate counts, this seems like a bug to me somewhere?
Any comments @ofek and @hugovk on my above question? I want to be sure I estimate the install/download counts with < 5-10% error
I've no idea, but it's not necessarily only your target user downloading stuff.
The requests library is used a lot for all sorts of things, it may be used by the mirror services or anything really.
You can add --test
to the pypinfo call to see the query it uses, and then use that to query BigQuery in another way to see if the numbers are different.
Please see PR #51.
Hi There,
thanks for this package, very helpful. It's unclear to me exactly what is being output by this tool? Is it a sum of download counts from various sources? How does that differ from "pip only" option
-p
? Is there a way to get unique download count etc? Some more details would be helpful.The default output for my packages is almost 6-10 times higher than if use the
--pip
option - any idea on why is that? I only ever recommended people to use my package viapip install
. Although there are some dev, who clone and install locally outside pip, that number is likely very small. So the--pip
option should closely match the default count (unless there is lot more going on which I don't understand).Also, if I would like to estimate "usage" (which is a higher bar from download) from this, would it make sense?
Thanks for your help.