Closed charlienewey closed 7 years ago
Yes, there's no need to run the Python file - I think the most recent version of the dataset is on Ara's MongoDB instance.
As promised, latest data. This is also on Ara's DB. We can merge now.
http://www.edshare.soton.ac.uk/18257/ 'This resource is empty'
Removed Ara's review, feel free to merge. ( and provide csv :smiling_imp: )
@charlienewey all the other branches are using src/some_package
, src/smg
is a little weird. (unless of course github is showing me strange things, which it sometimes does)
i guess we could start a preprocessing
package here and throw in the stuff @alexdy2007 and i are working on in to that same package and they'll both fall in to the same package.
@utkuozbulak i didn't dismiss it because it's still src/smg
@arahayrabedian Shit, you are right its still src/smg/ Im sorry :cry: :sob:
Block again ! :rofl: :rofl:
Whoops, that was accidental - blame beer. Fixed. And @utkuozbulak check edshare again, it was still uploading when you checked last time.
Ninja edit: WAT IT DIDN'T UPLOAD
Try this. https://www.dropbox.com/s/0ig91yyvyoda56x/jobs_norm_loc.tar.gz?dl=0
Ninja re-edit: EdShare does work now, after I said some rude words at it and re-uploaded the archive
silly edshare, thinking itself useful.
LGTM.
Yeah, there are quite a few errors/things that don't make sense, but there are 250k records in the dataset. If Google's geocoder doesn't like what's in the "LocationRaw" field (I'm not entirely sure that it's deterministic...), then we'll get some missing bits - sometimes the output from the geocoder doesn't tag the parts of the location correctly (i.e. it might come back with an administrative area but not a town etc), but that's just something we've gotta learn to live with, I think - it's noisy data, after all.
I wont learn to live with it, I will search for ways to fix it.
Looks good to me. I assume there is no need to run the python file as the API-key has been revoked?