pypi / legacy

This PyPI is no more! See https://github.com/pypa/warehouse.
Other
61 stars 46 forks source link

Download Stats Have Stopped Working (again) #396

Closed ewdurbin closed 8 years ago

ewdurbin commented 8 years ago

Originally reported by: Matthias Manhertz (Bitbucket: madmat3001, GitHub: madmat3001)


I noticed at the start of February that the download stats for my package twistml were stagnating. They have not increased since. I use the vanity package to track the downloads and its developer sent me here, as this has happened before. (Issue #330, Vanity Issue on github) On 12/02/16 I checked the total downloads for Django to make sure this was not related to my project. According to vanity, django has not had a single download since then.

Interestingly the download counters on the packages' pypi pages keep updating. (The ones showing last day, week and month)


ewdurbin commented 8 years ago

Original comment by zcczihiw (Bitbucket: zcczihiw, GitHub: Unknown):


I am wondering if there is any update on fixing download counts, so statistics are available. As I understand, the issue is backend (redis) servers not running / not working properly. Is there something I / we can contribute to fix this issue?

Is warehouse likely to have download statistics available soon? Apparently some information was updated on google cloud according to https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html but updates seem to have stopped on May 14th. Not that that is a viable option for developers, but just pointing out in case it is relevant.

Cheers.

ewdurbin commented 8 years ago

Original comment by Ioannis Filippidis (Bitbucket: johnyf, GitHub: johnyf):


Well, please read above or visit PyPI, to see that they have been removed hours ago.

ewdurbin commented 8 years ago

Original comment by Eclipse (Bitbucket: libeclipse, GitHub: libeclipse):


At this point, I'd be happy if they just removed the stats altogether if they can't be bothered fixing them.

ewdurbin commented 8 years ago

Original comment by Radim Řehůřek (Bitbucket: piskvorky, GitHub: piskvorky):


I'm also one of the people who didn't realise the situation with PyPI is so bad -- a call for help is definitely in order. To have such a central piece of the Python ecosystem in this state is... scary.

Thanks zcczihiw for pushing this.

ewdurbin commented 8 years ago

Original comment by zcczihiw (Bitbucket: zcczihiw, GitHub: Unknown):


Thank you for quick fix, and bit of background on project.

I definitely didn't point fingers at any developers - we all appreciate the service (of the project and developers). As you know, some of PyPI users are developers themselves, so if the project needs help, I am sure people will lend hands. I am wondering if it makes sense to post requests for help somewhere visible (perhaps on the left side menu of PyPI). Depending on how soon warehouse project is likely go live, this effort may not be worth it, but in case there are many issues that need addressing, it may be considered? If new project takes time to go live, perhaps download stats can be fixed. Is there a way for others to figure out why redis server is not working, and follow up / discuss? I went through sources and best guess I have is redis server URL is not valid ('queue_redis_url' and/or 'count_redis_url' in database, line 109 in config.py), so in webui, store.Store is called with 'redis=None'. I believe redis server is up and running, as otherwise querying for download counts would've worked due to exception catching.

ewdurbin commented 8 years ago

Original comment by Donald Stufft (Bitbucket: dstufft, GitHub: dstufft):


I agree with disabling the statistics so I've gone ahead and done that (basically using @zcczihiw 's patch, which made this easier- thanks).

The issue is not apathy, but one of time. I've recently written about it (https://caremad.io/2016/05/powering-pypi/) but the tl;dr is that PyPI is barely holding on and there is a huge number of fires at any given time across all of the pieces of the packaging ecosystem that I personally work on. That means that some things get tossed in an attempt to shed some load and trying to communicate is one of the "easier" things to stop doing.

ewdurbin commented 8 years ago

Original comment by zcczihiw (Bitbucket: zcczihiw, GitHub: Unknown):


Following patch disables download counts. I am attaching here instead of sending pull request, as I haven't tested the patch (I don't have PyPI setup), and it is up to developers to decide if this approach is acceptable to them. I am guessing the issue is with redis server that keeps track of download counts.


diff -u a/webui.py b/webui.py
--- a/webui.py  2016-05-13 22:07:25.000000000 -0400
+++ b/webui.py  2016-05-15 13:35:40.991126033 -0400
@@ -1947,11 +1947,13 @@
         docs = self.store.docs_url(name)
         files = self.store.list_files(name, version)

+        # Disabled download counts (redis server not reliable?)
         # Download Counts from redis
-        try:
-            download_counts = self.store.download_counts(name)
-        except redis.exceptions.ConnectionError as conn_fail:
-            download_counts = False
+        # try:
+        #     download_counts = self.store.download_counts(name)
+        # except redis.exceptions.ConnectionError as conn_fail:
+        #     download_counts = False
+        download_counts = False

         self.write_template('display.pt',
                             name=name, version=version, release=release,
ewdurbin commented 8 years ago

Original comment by Ioannis Filippidis (Bitbucket: johnyf, GitHub: johnyf):


I think that disabling the stats is the best suggestion I've read. I agree that the impression given to unfamiliar visitors is distorted. Indeed, downloads on pypi.io are fictitious. Also, I do agree that a slightly more active presence by the PyPI developers would be much appreciated, even if no significant development activity is intended to commence. I would expect that removing stats, or placing a banner reciprocal to that shown in pypi.io (as in "beware, this is the old PyPI, slowly falling apart, but there is change coming, look at warehouse" would be more informative. I'd like to help with that, but I'm not a PyPI dev (neither a warehouse one).

ewdurbin commented 8 years ago

Original comment by zcczihiw (Bitbucket: zcczihiw, GitHub: Unknown):


While download counts are irrelevant to reliability of Python, users infer how useful a PyPI module/tool is; if a project shows 0 downloads (or just a tiny fraction as it has been last few days), users may assume a project is not worth their time. Not addressing this issue until new service is up is a great disservice to many developers, especially as there is no word on when the new service will be up (download stats on pypi.io are broken too).

Until this issue is resolved, would it be better to disable download stats, and put a warning that download counts are broken?

It is rather disconcerting that a project of this importance is being neglected with such apathy - issue tickets are neglected, or download stats would sometimes work for a while after an issue is filed without any comments on what the fix was etc.

ewdurbin commented 8 years ago

Original comment by Ioannis Filippidis (Bitbucket: johnyf, GitHub: johnyf):


As already mentioned in similar issues, all the developers are working on warehouse, the successor of PyPI. So, those who would like to help can contribute to warehouse, so that it be completed earlier (it is already deployed).

The behavior of PyPI is irrelevant to the reliability of Python.

ewdurbin commented 8 years ago

Original comment by kmmbvnr (Bitbucket: kmmbvnr, GitHub: kmmbvnr):


Guys, that's really annoying. I'm start thinking that python is not reliable language for highload.

ewdurbin commented 8 years ago

Original comment by Eclipse (Bitbucket: libeclipse, GitHub: libeclipse):


Any idea why this is broken? Anything we can do to help speed this up?

ewdurbin commented 8 years ago

Original comment by Anthony Tuininga (Bitbucket: anthony_tuininga, GitHub: Unknown):


I notice that highly popular packages that often have thousands of downloads/day show 0 downloads for the day and small numbers for the week. Any idea on when this is going to be fixed?

ewdurbin commented 8 years ago

Original comment by Eclipse (Bitbucket: libeclipse, GitHub: libeclipse):


Any ideas? It's still broken.

ewdurbin commented 8 years ago

Original comment by Eclipse (Bitbucket: libeclipse, GitHub: libeclipse):


Seems to be bugged again. My packages are showing 0.

ewdurbin commented 8 years ago

Original comment by AJ R (Bitbucket: fomcl, GitHub: fomcl):


Not working here either:https://pypi.python.org/pypi/savReaderWriter/3.4.2. This used to be an externally hosted package (until pep 470 became effective). In my setup.py, setuptools.setup still has a downloads= parameter in it. Is that a problem?

ewdurbin commented 8 years ago

Original comment by Christian Bertschy (Bitbucket: cbertschy2, GitHub: Unknown):


Seems to be not working again: https://pypi.python.org/pypi/Django/1.9.5

ewdurbin commented 8 years ago

Original comment by thomas_haslwanter (Bitbucket: thomas_haslwanter, GitHub: Unknown):


The download counter for my project "thLib" are no longer working: I know that there have been downloads, but they are not indicated any longer. Maybe related to Issue # 397

ewdurbin commented 8 years ago

Original comment by Michael Egorov (Bitbucket: michwill, GitHub: michwill):


JSON API appears to be still broken but the web UI seems fixed

ewdurbin commented 8 years ago

Original comment by Kim Thoenen (Bitbucket: Chive, GitHub: Chive):


Not for me: https://pypi.python.org/pypi/Django/json version 1.9.4 shows 0 downloads (and was released two days ago)

ewdurbin commented 8 years ago

Original comment by MacLane Wilkison (Bitbucket: mwilkison, GitHub: Unknown):


Appears to be working now.

ewdurbin commented 8 years ago

Original comment by Michael Egorov (Bitbucket: michwill, GitHub: michwill):


@madmat3001 Since it's now not only the API link but the whole thing (e.g. no way to get non-zero download counts at all) - can you change the priority of the issue to major?

ewdurbin commented 8 years ago

Original comment by Michael Egorov (Bitbucket: michwill, GitHub: michwill):


A pretty major issue for me // still 0 downloads for all packages

ewdurbin commented 8 years ago

Original comment by Hicham Janati (Bitbucket: JanatiH, GitHub: JanatiH):


Downloads stats are broken again. I was surprised to see 0 downloads in my package https://pypi.python.org/pypi/pyldpc, but I realized this includes very popular packages (numpy, scipy .. ) https://pypi.python.org/pypi/numpy/1.10.4 (0 downloads in the last day ..)

ewdurbin commented 8 years ago

Original comment by Matthias Manhertz (Bitbucket: madmat3001, GitHub: madmat3001):


@johnyf Fixed the link, thx

ewdurbin commented 8 years ago

Original comment by Ioannis Filippidis (Bitbucket: johnyf, GitHub: johnyf):


Same result using pypi-cli: it reports correctly last day, week, month (as shown when browsing PyPI), but download statistics over time have 0 for the latest package version.

ewdurbin commented 8 years ago

Original comment by Ioannis Filippidis (Bitbucket: johnyf, GitHub: johnyf):


@madmat3001 the link to Issue 330 actually links here, to 396.

ewdurbin commented 8 years ago

Original comment by Ioannis Filippidis (Bitbucket: johnyf, GitHub: johnyf):


Exactly the same observation for the package dd. I manually fetched the information following vanity's code:

#!python
from urllib2 import urlopen
import json

s = 'https://pypi.python.org/pypi/dd/json'
response = urlopen(s)
data = str(response.read())
d = json.loads(data)
print(d)

In a browser, PyPI shows the current downloads, but it gives 0 downloads in its JSON data. Note that the above code does not use vanity. If this approach for fetching data is correct, then this is a PyPI issue. Otherwise, please let me know what is the correct way to obtain these statistics.

ewdurbin commented 8 years ago

stats have been disabled/removed from the legacy codebase, the successor linehaul will be enabled with the launch of warehouse

jantman commented 8 years ago

I don't like to be a bad person and comment on a closed issue, but... where do we go to find out when Warehouse will be live? I can't seem to find any public information on it.

I was relying on pypi's download counters to gauge usage of my public projects... with them broken, I'm tempted to be a Bad Person and try to put some sort of phone-home logic in my libraries, just to try and get an idea of how many people are using it and what versions.

I also still don't see any information on how to volunteer help. I'd be more than happy to assist with the current infrastructure if there's need for more hands/eyeballs.

ewdurbin commented 8 years ago

@jantman the stats in warehouse are an accurate reflection. you can use them for the same purposes.

awnumar commented 8 years ago

@ewdurbin As far as I can tell, there are no stats at warehouse anymore.

ewdurbin commented 8 years ago

well shit.

ewdurbin commented 8 years ago

@jantman @libeclipse

for the time being, the details on the big query (publicly accessible, just login with google) are available here: https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html that will actually answer questions like "how many people are using it and what versions", where as the previous download stats were highly volatile and provided no insight into versions. they were arguably just vanity numbers.

if someone is interested in proposing a cron job or something similar to restore the stats into pypi-legacy i'll happily review.

ewdurbin commented 8 years ago

@jantman as a point of order, i understand your concerns and what you are trying to accomplish but wanted to state that vague threats of doing "bad things" like "phoning home" on install and such are really not productive in encouraging maintainers or contributors, it really just makes us feel bad that we do not have more time to contribute to specific needs.

ewdurbin commented 8 years ago

for instance here is a useful download query representing download counts by version by day since July 1st for one of my projects.

SELECT
  DATE(timestamp) as day,
  file.project,
  file.version,
  COUNT(*) as total_downloads,
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    TIMESTAMP("20160701"),
    CURRENT_TIMESTAMP()
  )
WHERE
  file.project = 'clandestined'
GROUP BY
  day, file.project, file.version
ORDER BY
  day asc
LIMIT 100
ewdurbin commented 8 years ago
screen shot 2016-08-04 at 2 27 35 pm
ewdurbin commented 8 years ago

@dstufft @berkerpeksag i'm going to lock this conversation for now, feel free to unlock if you believe it warrants further discussion.