pypinfo now uses an updated BigQuery table to get download numbers which is more accurate, and uses less quota for most queries, but it's gone up for some.
For example:
$ pypinfo --days 365 "" project
Served from cache: False
- Data processed: 87.84 GiB
+ Data processed: 1.69 TiB
- Data billed: 87.84 GiB
+ Data billed: 1.69 TiB
- Estimated cost: $0.43
+ Estimated cost: $8.45
This is up from "bytes_billed": 50120884224 (~50 GB) on 1st April (x4.5 bigger).
But failed on the 365-day:
...
File "/usr/local/lib/python3.6/dist-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://www.googleapis.com/bigquery/v2/projects/top-pypi-packages/queries/...?maxResults=0&timeoutMs=10000: Quota exceeded: Your project exceeded quota for free query bytes scanned. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
On 1st April the 365 was "bytes_billed": 951669751808 (~951 GB), so x4.5 = ~4.28 TB!
1 April was ~50 GB + ~951 GB, must have come in just under the 1 TB limit.
1 May was an ~225 GB + estimated 4.28 TB...
Option 1: Rough calculation: there's quota to get 365 data for 724 packages. So rounding down, perhaps it will work for say, 500 or 100 packages? Would that still be useful?
Option 2: Alternatively, could ditch the 365 data altogether, and perhaps bump 30-day data from 4,000 back up to say 5,000.
pypinfo now uses an updated BigQuery table to get download numbers which is more accurate, and uses less quota for most queries, but it's gone up for some.
For example:
https://github.com/ofek/pypinfo/pull/112/files#diff-7b3ed02bc73dc06b7db906cf97aa91dec2b2eb21f2d92bc5caa761df5bbc168fR233
The 1st April cron successfully fetched the 30-day data:
That's ~225 GB.
This is up from
"bytes_billed": 50120884224
(~50 GB) on 1st April (x4.5 bigger).But failed on the 365-day:
On 1st April the 365 was
"bytes_billed": 951669751808
(~951 GB), so x4.5 = ~4.28 TB!The free monthly quota is 1 TB.
1 April was ~50 GB + ~951 GB, must have come in just under the 1 TB limit.
1 May was an ~225 GB + estimated 4.28 TB...
Option 1: Rough calculation: there's quota to get 365 data for 724 packages. So rounding down, perhaps it will work for say, 500 or 100 packages? Would that still be useful?
Option 2: Alternatively, could ditch the 365 data altogether, and perhaps bump 30-day data from 4,000 back up to say 5,000.
Feedback welcome!
In the meantime, I've pushed the 30-day data.