wklumpen / equity-pulse-web

Equity Pulse is a web application and visualization platform using Flask+D3 to support equity and access calcualtions for TransitCenter/SSR/SF2 Work
4 stars 0 forks source link

NYC data download crashes site #77

Closed wklumpen closed 3 years ago

wklumpen commented 3 years ago

Not sure why at this time.

wklumpen commented 3 years ago

The issue appears to be simply the size of the query for "all". I've tried breaking the query into three different subgroups, but the size of NYC simply makes the query too big for the webserver, causing it to time out.

One possible solution here is to simply have the individual downloads for all scores by block group be done through the map. A user selects the view they want, and then chooses to download that score from there. This would make getting specific scores much quicker, but would remove the ability to download a huge bulk dataset of all the data (which may be good).

We could always suggest that bulk data queries can be filled on request (I doubt there'll be any/many)

wklumpen commented 3 years ago

Tagging @aakarner and @dana-rg for discussion also.

wklumpen commented 3 years ago

A relevant piece of information: Just the cumulative scores alone (Jobs, Low-Income Jobs, and Parks times variations for auto and fares) for a single date for NYC are 80+MB. This requires a huge query plus a pivot to create a wide dataframe. That also puts a massive load on the server for a brief amount of time as pivoting a table that big takes a lot of RAM.

mlbtc commented 3 years ago

Is the size of the datafile, but for the urban core only, more reasonable? I think the likeliest people to use the download will be focused on the City of New York and potentially inner ring suburbs.

mlbtc commented 3 years ago

Another question: if we put the NYC data back in the long data frame, as you had originally, would that make the query easier to run?

mlbtc commented 3 years ago

Regardless, I like the idea of being able to send the full dataset upon request, since it sounds like the full dataset will be too large to download through the site.

Can we add a note to that part of the NYC download center that says "The full NYC data set is too large to download directly from the Dashboard. Email dashboard@transitcenter.org to request the full NYC data set for a specific date."

We could add the same bullet point to the "How it works" page under the "Where can I find the data and source code" section.

wklumpen commented 3 years ago

This may end up being an issue for more cities, so I have update the download page with an expanded note (Chicago example):


Due to the size of the data, bulk block-group level scores across all destinations and measures are not available for direct download. You can access and download the block-group specific scores for a single destination and measure via the following steps:

  1. Visit the map page for Chicago
  2. Use the map configuration panel to choose the dataset you would like to download, and update the map.
  3. Click the download tab on the side panel and choose whether to download as CSV or as a spatial dataset.

If you would like to request a larger portion of the dataset for research or analysis, please email dashboard@transitcenter.org