namoopsoo / play-clj-ml

Messing around with machine learning in clojure
0 stars 0 forks source link

milestone 1: get basic data annotated using earlier geo annotation #1

Closed namoopsoo closed 6 years ago

namoopsoo commented 6 years ago

Summary

Tasks

for later

namoopsoo commented 6 years ago

Basic data...

In [134]: simpledf.shape Out[134]: (888989, 2)

In [135]: simpledf.head() Out[135]: start_sublocality end_sublocality 0 2 2 1 2 2 2 2 2 3 2 2 4 2 2

In [138]: simpledf.start_sublocality.value_counts() Out[138]: 2 790758 1 88920 3 9311 Name: start_sublocality, dtype: int64

* Here, `make_dead_simple_df()` uses simple borough mapping
```python
{'Brooklyn': 1, 'Manhattan': 2, 'Queens': 3}

In [149]: gpby = simpledf.groupby(by=['start_sublocality', 'end_sublocality'])

In [150]: gpby.count() Out[150]: unit start_sublocality end_sublocality
1 1 67863 2 17267 3 3790 2 1 16972 2 771653 3 2133 3 1 3595 2 1921 3 3795

In [151]: 771653/simpledf.shape[0] Out[151]: 0.8680118651636859


* So based on the above, the data is heavily weighed, 86.8% on trips within `Manhattan`.