pittcsc / PantherView

Spring semester Open Data club project: PantherView
http://pittcsc.org/PantherView/
MIT License
24 stars 24 forks source link

WPRDC SQL: Date range #95

Closed vonbearshark closed 7 years ago

vonbearshark commented 7 years ago

Right now we have a number of datasets (at least the 311 data) that sorts the WPRDC dataset by date and gets the most recent N-number of results. This is not the same as getting all the results from this month--N could be more than this month, or it could be less. Right now we filter out the dates on the client side, but that means that we're either getting too much data from the server (which is a waste), or we're not getting enough data for the month. We need to add to the SQL query that we send some kind of check to make sure we only get the appropriate dates. This will likely go within the dataset object's primaryFiltering property.

We should take also out any limits we have placed, and we should remove the client-side code that check to make sure the data is in the proper range (although we can use that check now to log if we have any extra records while we develop this solution).

This can help you get started: http://stackoverflow.com/questions/14208958/select-data-from-date-range-between-two-dates

vonbearshark commented 7 years ago

@cey10 I noticed you're interested in being a collaborator, but you haven't picked up an Issue yet--would you be interested in taking this one? If you need any help, we can work on it together.

cey10 commented 7 years ago

@vonbearshark I would. I'll look at it and let you know if I have any questions, thanks!

cey10 commented 7 years ago

@vonbearshark hey JJ. I think I understand what we're trying to do and how that's accomplished, but I'm not really sure how to implement that into the code that is already here. Could you explain how data is fetched from the server? Is the date limit related to options in the fetchWPRDCData function?

vonbearshark commented 7 years ago

We have this collection of datasets here: https://github.com/Pitt-CSC/PantherView/blob/master/main.js#L224. Each one has a .primaryFiltering field, for example, the city police data: https://github.com/Pitt-CSC/PantherView/blob/master/main.js#L227. This will eventually be concatenated into the SQL part of the URL we send to the WPRDC server, here: https://github.com/Pitt-CSC/PantherView/blob/master/main.js#L349. So you could add the SQL to .primaryFiltering in each dataset that needs it (all but the Library dataset). It means we'll have to duplicate the query for each dataset that needs a date, but that's not too bad, and it helps us prevent asking for a date from datasets that don't need one, because we manually define it.

Once the URL is constructed with the "sql" parameter in it, we send a request to the WPRDC server. They have configured their server to parse and execute the commands we provide via the "sql" parameter. Then the server returns the results it receives back to us.

Does that clarify things?

BenjaminMuscato commented 7 years ago

Hey @vonbearshark, check out PR #123 for my solution to this issue. The arbitrary limitation has been removed, and the SQL is now properly retrieving only the records that were created within the last 30 days. The client-side pruning of the data (i.e. records from the last day, records within the last week) is still in-place.

vonbearshark commented 7 years ago

Commented there, too, but can we remove the limitation on the other sets, too?

vonbearshark commented 7 years ago

Closed by #123