Make simple linear model using the extremely simple single independent variable dataset of just the borough,

We create a dead simple linear-model

But first a way to actually read that csv data...


(ns matrix-project.core
(:use [incanter.charts :only [xy-plot add-points scatter-plot add-lines]]
    [incanter.core :only [view]]
    [incanter.stats :only [linear-model]]
    )
(:require
         [clojure.core.matrix :as mtrix]
         [clojure.data.csv :as csv]
         [clojure.java.io :as io]) 
)

(def fname "201510-citibike-tripdata.simple.csv")

(defn load-csv-data [fname] (let [ table (with-open [reader (io/reader fname)] (->> (csv/read-csv reader) (mapv pass))) header-row (first table) columns (->> (rest table) (parse-table-as-doubles) (mtrix/transpose))] {:header header-row :columns columns} ))

* ...
```clojure
(defn make-super-simple-linear-model
  [data-table]
  (let [
        Y (nth (:columns data-table) 2)
        X (nth (:columns data-table) 1)
        simple-linear-model (linear-model Y X)
        ]
    simple-linear-model))

; on repl...
matrix-project.core=> (def simple-model (make-super-simple-linear-model simple-data))
#'matrix-project.core/simple-model

matrix-project.core=> (simple-model :coefs)
[0.711438996099119 0.6281347329684195]
matrix-project.core=> (keys simple-model)
(:y :sse :msr :design-matrix :mse :t-probs :adj-r-square :df :coef-var :residuals :ssr :sst :coefs :f-stat :r-square :f-prob :t-tests :x :std-errors :fitted :coefs-ci)

matrix-project.core=> (:sse simple-model)
55234.16477182651
matrix-project.core=> (:mse simple-model)
0.06213157759542773
matrix-project.core=> (Math/sqrt (:mse simple-model))
0.24926206609796792
matrix-project.core=> (:r-square simple-model)
0.39422113636387146

so going from Manhattan (2), what sayeth our lineareth modeleth,


(defn simple-predict
[simple-model X]
(let [
    coefs (simple-model :coefs) 
    beta_coef (first coefs)
    error_coef (last coefs)
    ]
(->>
  (map #(* % beta_coef) X)
  (map #(+ % error_coef))
  )))

; matrix-project.core=> (def inputs [1 2 3])

'matrix-project.core/inputs

matrix-project.core=> (simple-predict simple-model inputs) (1.3395737290675385 2.0510127251666574 2.7624517212657764)

* bringing back the transition numbers from #1 ...
```python
                                     unit
start_sublocality end_sublocality        
1                 1                 67863
                  2                 17267
                  3                  3790
2                 1                 16972
                  2                771653
                  3                  2133
3                 1                  3595
                  2                  1921
                  3                  3795

Using those as the perspective for the super simple linear model predictions,
- (1.3395737290675385 2.0510127251666574 2.7624517212657764)
- The transition from Manhattan 2 to 2.05 is spot on, with a minor preference to 3 which should have been less than 2.0 perhaps.
- And 1 to 1.34 also looks good
- And Queens riders mainly stay in Queens, 3 goes to 2.76.
Although using 1,2,3 as a numeric is clearly flawed since these are really more like classes, which is where the linear model limitation is clear. But it was a neat experiment.

namoopsoo / play-clj-ml

Milestone 2: Create first super simple classifier on the basic geo annotated data #2

Make simple linear model using the extremely simple single independent variable dataset of just the borough,

'matrix-project.core/inputs