[Analyzers- plz read] -- Data Formats from Curators

stat157 / background

0 stars 4 forks source link

[Analyzers- plz read] -- Data Formats from Curators #15

Closed rerock closed 10 years ago

rerock commented 10 years ago

The cleaned data that were produced by Curator subgroup 2 are in CSV formats, which are very large and currently unable to be pushed to git-hub.

Curator subgroup-Cache Money is trying to cache the clean data that were produced by Curator subgroup 2 onto google spreadsheets.

Analyzers, would you think these curated data may be the ones you want to use to analyze? If not, are there any other data formats you have in mind?

Thank you.

@xsherryxia @arifyali @kimbelyle @jest4pun

tristantao commented 10 years ago

It was not my original intention to push the entire data onto github.

That is why I did not add the cached data onto disk. We can either call the function to pull the data, OR, alternatively push the data on a different hosting service such Box or DropBox.

GoogleDoc is a possible alternative. Lastly, I can re-write the code to use a zipped/tar-ed cache file, which should be small enough to be saved on github.

arifyali commented 10 years ago

@tristantao I have been able to cache to full csv without the use of zip

tristantao commented 10 years ago

@arifyali

Are you talking about caching on github? Or locally? Local caching is not an issue, obviously. But csv file will become larger as we begin hosting data (github, Box, DropBox). Since we are planning to expand our data source, we have to think about future proofing. 50MB might be fine on github (I'm not sure), but once we are hosting a few hundred MB of csv, github might not be happy about that.

Perhaps you miss-understood what I mean by "caching"

Lastly, aren't you the one who wanted to explore GoogleDoc to save the data? What was the purpose of that if you weren't trying to save/cache the data?

rerock commented 10 years ago

We attempted to cache the data via google spreadsheet; however, due to limited set by google, we had to forgo this method. therefore we have cached everything as a csv and it is available in our rep

gnolnait commented 10 years ago

For running the ETAS model in R specifically, analyzers need an object of the class "ppx" which is basically a dataframe with spatio-temporal observations (in this case time, long, lat, mag, mag.type, depth, ref, and date) as well as a specified domain of data. See iran.quakes and jap.quakes for reference. Pages 9 and 10 in the pdf for more detail.

rerock commented 10 years ago

Thanks for the reply. We will try to see if we can provide what you want. Will keep you updated.