stat157 / recent-quakes

Stat 157 Homework 2 due on Monday 2013-10-21 at 11:59pm
0 stars 20 forks source link

Group 2 issue w generalization #24

Open teresita opened 10 years ago

teresita commented 10 years ago

The hard-coded values were removed from the plotting function, but they're stored in a dictionary so you can't generalize to do any other regions than california and alaska.
@lauraccunningham

there's a spreadsheet cache of the data, but no way to cache live data. and no mechanism for extracting cached data from the spreadsheet. there is also no indication of the magnitude or depth of the quakes in the plotting methods, although depth/mag are all in the group's data frame.

otherwise, the group's exact code is reproducible. just not generalizable.

davidopluslau commented 10 years ago

To preface, there was more functionality that we would have written in had we had more time. We had to forgo our display parameter caching choice due to time constraints. Our partial attempt is documented in our comments near the beginning of plot_data.

That said, we do have a method to cache live data, use previously cached data, as well as indicate the display parameters outside of the body of the code.

Live data caching is tested to work. For the getEarthquakeDataframe function, setting the isURL and save flags to true will cache the data locally. While I will agree that the other issues you bring up are some mix of unfinished, untested, and/or poorly documented, I do feel that this was readily apparent.

While untested, we also have a mechanism for extracting cached data. By setting isURL to False, getEarthquakeDataframe will attempt to import a pre-cached JSON for use in the rest of the dataframe processing in a way that is functionally identical to using live data. There could potentially be file path issues with this, but we do feel that this was also relatively clear in the body of the code. An issue with file saving wiped out commenting for this that I had previously written, so it was my mistake for not re-checking for commenting, and it is understandable that my responsibility to document code comes before yours to read it.

For plot_data, it is true that we didn't interact with quake depth. This is because we could not find quake depth. An average earthquake from the USGS database (in this case, pulled from http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/1.0_week.geojson) looked like this:

{"type":"FeatureCollection","metadata":{"generated":1382477297000,"url":"http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/1.0_week.geojson","title":"USGS Magnitude 1.0+ Earthquakes, Past Week","status":200,"api":"1.0.11","count":1022},"features":[{"type":"Feature","properties":{"mag":2.2,"place":"64km NW of Nikiski, Alaska","time":1382476834000,"updated":1382477044265,"tz":-480,"url":"http://earthquake.usgs.gov/earthquakes/eventpage/ak10828448","detail":"http://earthquake.usgs.gov/earthquakes/feed/v1.0/detail/ak10828448.geojson","felt":null,"cdi":null,"mmi":null,"alert":null,"status":"AUTOMATIC","tsunami":null,"sig":74,"net":"ak","code":"10828448","ids":",ak10828448,","sources":",ak,","types":",general-link,geoserve,nearby-cities,origin,tectonic-summary,","nst":null,"dmin":null,"rms":0.63,"gap":null,"magType":"Ml","type":"earthquake","title":"M 2.2 - 64km NW of Nikiski, Alaska"},"geometry":{"type":"Point","coordinates":[-152.2632,61.0154,100]},"id":"ak10828448"}

Nowhere in here is there any clearly-marked depth indicated. It was an oversight on our part not to ask this to the rest of the class, especially since your team evidently found it, but we assumed at the time that it was not in the data.

We used magnitude to vary the size of the quakes; however, after seeing the variance of sizes today in your group's presentation, it is very likely our specific implementation is broken. Our attempt at it can be seen by searching for the line in the file that includes "quakes.Magnitude".

ashleysiaailes commented 10 years ago

Just to reiterate what David has said, I wanted to further explain why we chose to use a method of live caching as opposed to other methods.

We explored and thought about using the latitudes and longitudes in the database to help us make our maps. For example, taking the average latitudes and longitudes of the data points and centering our maps there, or taking the maximum and minimum longitude and latitude points to create our four corners, but after a lot of debate, we ultimately decided not to do this because, while that is the easiest way to generalize the data it would come with some consequences. For example, if we were to pull data for California, and just so happens there were only earthquakes in San Diego, then finding the average, maximum, or minimum latitude and longitude would yield only a map of San Diego, and not of California like we would want.

We understand that our method is not perfect by any means either, but we just wanted to explain our train of thought for taking the direction we did. I think it also adds an interesting idea for a potential other method to use rather than what's been done, and since this is collaborative, maybe someone will see our way of doing it and think of a different idea of how to improve it :) All food for thought.