Closed wklumpen closed 3 years ago
The issue appears to be simply the size of the query for "all". I've tried breaking the query into three different subgroups, but the size of NYC simply makes the query too big for the webserver, causing it to time out.
One possible solution here is to simply have the individual downloads for all scores by block group be done through the map. A user selects the view they want, and then chooses to download that score from there. This would make getting specific scores much quicker, but would remove the ability to download a huge bulk dataset of all the data (which may be good).
We could always suggest that bulk data queries can be filled on request (I doubt there'll be any/many)
Tagging @aakarner and @dana-rg for discussion also.
A relevant piece of information: Just the cumulative scores alone (Jobs, Low-Income Jobs, and Parks times variations for auto and fares) for a single date for NYC are 80+MB. This requires a huge query plus a pivot to create a wide dataframe. That also puts a massive load on the server for a brief amount of time as pivoting a table that big takes a lot of RAM.
Is the size of the datafile, but for the urban core only, more reasonable? I think the likeliest people to use the download will be focused on the City of New York and potentially inner ring suburbs.
Another question: if we put the NYC data back in the long data frame, as you had originally, would that make the query easier to run?
Regardless, I like the idea of being able to send the full dataset upon request, since it sounds like the full dataset will be too large to download through the site.
Can we add a note to that part of the NYC download center that says "The full NYC data set is too large to download directly from the Dashboard. Email dashboard@transitcenter.org to request the full NYC data set for a specific date."
We could add the same bullet point to the "How it works" page under the "Where can I find the data and source code" section.
This may end up being an issue for more cities, so I have update the download page with an expanded note (Chicago example):
Due to the size of the data, bulk block-group level scores across all destinations and measures are not available for direct download. You can access and download the block-group specific scores for a single destination and measure via the following steps:
If you would like to request a larger portion of the dataset for research or analysis, please email dashboard@transitcenter.org
Not sure why at this time.