thekingofkings / chicago-crime

Crime correlation anaysis
MIT License
11 stars 3 forks source link

The social flow (LEHD) matrix *transpose* #2

Closed thekingofkings closed 8 years ago

thekingofkings commented 8 years ago

The social flow (LEHD) matrix transpose issue

Background

All confusion rooted in the lag variable calculation:

 d = M * y

How the social flow matrix is calculated?

Due to some historical issue, when I initially write my Python code to process the LEHD flow matrix, I have a nested dictionary to track the flow. The first level keys are source CA ID and the secondary level keys are destination CA ID.

  M[src][dst] # tracks the number of flow from src to dst

How the lag variables calculated?

There are three kinds of normalization. Take normalize by destination as example. The flow matrix M is multiplied with column vector y, i.e. each row of the M should be the percentage of traffic from all sources to dst, and sum to 1.

Pitfall

According to the top formula, we have

 M[ i ][ j ] # tracks the number of flow from j to i

Thus a transpose is needed to make everything consistent.

thekingofkings commented 8 years ago

In this commit, I fix the transpose issue in Python

thekingofkings commented 8 years ago

In this older commit, I apply transpose on the social lag as well.

Two transpose make the transpose problem a bug.

thekingofkings commented 8 years ago

Remove that line in file R\pvalue-evaluation.R

problem fixed