stat157 / recent-quakes

Stat 157 Homework 2 due on Monday 2013-10-21 at 11:59pm
0 stars 20 forks source link

How to reproduce the data if jason_read() is not working for everyone #9

Open sunnymh opened 10 years ago

sunnymh commented 10 years ago

It seems like json_read() only works for a few people. So if that is the case, how do we reproduce the data in class on Tuesday? Those people who can't use json_read() definitely can't reproduce the code from groups which use that function.

kqdtran commented 10 years ago

Idk of any elegant method, but there's a workaround in https://github.com/stat157/recent-quakes/issues/3

sunnymh commented 10 years ago

That's a work around for people who can't use json_read(), but I was wondering how do we check the code of people who use json_read() on Tuesday?

teresita commented 10 years ago

@sunnymh if you're getting errors, it could be a misspelling of json? (there's no 'a')

aculich commented 10 years ago

@teresita Thanks for picking up on the spelling error here. Definitely no 'a' in JSON, so that could be the problem. @sunnymh is this working for you now?

sunnymh commented 10 years ago

@aculich That's not actually my question. As in #3 people are getting errors using read_json() ValueError: arrays must all be same length, and I got the same error using read_json() as well. So I used json.load() as suggested in #3 and I think a lot of people are using json.load() as well. So my question is that, for people like me, there is the possibility that we can't run other people's code which uses read_json(). Sorry about the misspelling.

aculich commented 10 years ago

The Steps to Curate Data: Issue #8 contains most of the answer to this problem. An alternative acceptable method would be to use the CSV version of the new data which is available here:

http://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php

sunnymh commented 10 years ago

@aculich #8 uses json.load() as well, which doesn't require us to install the latest version of Pandas to run read_json(). So on Tuesday, people who use json.load() might not be able to run read_json() if some other groups happen to use it. Is that going to be a problem?

aculich commented 10 years ago

So what you suggest here is an interesting conundrum.... and illustrates why we are using a virtual machine. The code that uses read_json() needs pandas upgraded to version 0.12, but that might impact other code the person has on their machine if it relies on an earlier version of pandas. In general the pandas code is probably forwards-compatible, but you can't be sure. So as long as you provide instructions in your version of the README.md file you should be able to get other people to upgrade their version of pandas to run your code. If we were not using a virtual machine this could cause real problems that lead to dependency hell which multiple conflicting versions of packages need to be installed. We will discuss this in class and how we can use VMs as a strategy to handle this problem. Whichever strategy you've chosen will likely work okay for Tuesday's code review, but in practice it is important to be mindful of the implications.