Closed mtwe closed 8 years ago
Also, I think we may want to eventually put all the data in a database once we finalize what we want
So I went through and cleaned up the data and put both datasets in a pretty standard format. There are four columns, datetime, doserate, lat, and long. They may not be in the same order for both datasets, though. I've uploaded the cleaned csvs here:
MEXT: https://mega.nz/#!SQJ0WArI!nylnCvEaY9i0xBrhiWMHFQI6CpWRjJs1sYnTqfQPP-c Safecast: https://mega.nz/#!yJ4VyQZD!OHmeIu9ElTE6MKWvRCl2hc24Kbj-XjJexPmYGi2jdLY
I also put all of the data in a mongodb database on my local machine, but I am unsure of how to make it accessible to everyone. Perhaps I could upload it to my class server space and give you all permission to access it? I am not too familiar with databases or with hosting on servers. Do you guys have any ideas?
My hunch is that the database route will be the best way to feed the data into any sort of dynamic visualization. That is how the safecast app stores their data. The database will also make it easier to compare measurements. I've read that there are ways to query on distance between two lat long points.
@awachte @RyanCaldwell1 Not sure why I didn't see this before but the safecast data actually has an API that we can query for information. I'm not sure yet if that would be useful or a preferred way to go, but I wanted to see what you guys thought. We could always do it both ways
Also, check out http://safecast.org/tilemap/methodology.html to see how they do a lot of the processing and visualization for the data. There is also a link there to download a filtered dataset which is about half the size of the original. Poking around on the googlegroup for safecast, I found that some people had detected some bad and unstandardized data which resulted in this filtered dataset. I think we should probably just go ahead and use that instead. Still has 32.1 million entries and the date format actually make sense now.
The MEXT data seems to run from March - December 2011. When I queried the safecast api, there still seems to be about 21 million measurements in that time period. So we should be in business.