klarsen1 / MarketMatching

Other
131 stars 37 forks source link

RelativeDistance normalised by 'ref' rather than 'test' #18

Closed Jay-Down closed 4 years ago

Jay-Down commented 4 years ago

Hi,

Line 47 of functions.R, the calculate_distances function, the dtw distance is normalised using the abs(sum()) of the query timeseries:

dist <- dtw(test, ref, window.type=sakoeChibaWindow, window.size=warping_limit)$distance / abs(sum(test))

However, I can only replicate the RelativeDistance values returned in BestMatches if the dtw distance is divided by the abs(sum()) of the reference timeseries; naively this makes sense as the reference ts is common amongst all possible matches.

I don't understand how this is the case given the code though, as the ref and test vars are defined correctly - mkts variable has all columns in the correct/expected positions, i.e. query at index 1 and ref at 2:

mkts <- create_market_vectors(data, ThisMarket, ThatMarket)
 test <- mkts[[1]]
ref <- mkts[[2]]

I.e. in mkts.csv.zip, column 1 is my query ts and 2 is my ref (checked and double-checked!), the BestMatches dataframe returned by best_matches() lists the RelativeDistance as 0.6599712, but this value is equal to:

dtw(mkts[[1]], mkts[[2]], window.type=sakoeChibaWindow, window.size=1)$distance / abs(sum(mkts[[2]]))

as opposed to:

dtw(mkts[[1]], mkts[[2]], window.type=sakoeChibaWindow, window.size=1)$distance / abs(sum(mkts[[1]]))

Apologies if I'm missing something obvious.

klarsen1 commented 4 years ago

It's been many years since I wrote this, but here's a question: Did you leave markets_to_be_matched=NULL? And if yes, have you checked both rows in the data frame for the distance between the two markets?

If I recall correctly, if you match all markets against all markets, best_matches will both show the distance from market A to market B, and market B to market A in separate rows. The idea is that it's finding both sides of the diagonal of the distance matrix. Basically, you can look at each market and see the ranks of matches. The normalization you alluded to actually does nothing from a ranking perspective. It's purely window dressing to make the distance look interpretable – ie, if the distance is 20 it means nothing, but 20/sum(x) can be interpreted better.

So I wonder if you see 0.6599712 in the other row where this distance is computed between the two markets.

K

On Tue, Apr 14, 2020 at 12:49 PM Jay Down notifications@github.com wrote:

Hi,

Line 47 of functions.R, the calculate_distances function, the dtw distance is normalised using the abs(sum()) of the query timeseries:

dist <- dtw(test, ref, window.type=sakoeChibaWindow, window.size=warping_limit)$distance / abs(sum(test))

However, I can only replicate the RelativeDistance values returned in BestMatches if the dtw distance is divided by the abs(sum()) of the reference timeseries; naively this makes sense as the reference ts is common amongst all possible matches.

I don't understand how this is the case given the code though, as the ref and test vars are defined correctly - mkts variable has all columns in the correct/expected positions, i.e. query at index 1 and ref at 2:

mkts <- create_market_vectors(data, ThisMarket, ThatMarket) test <- mkts[[1]] ref <- mkts[[2]]

I.e. in mkts.csv.zip https://github.com/klarsen1/MarketMatching/files/4477377/mkts.csv.zip, column 1 is my query ts and 2 is my ref (checked and double-checked!), the BestMatches dataframe returned by best_matches() lists the RelativeDistance as 0.6599712, but this value is equal to:

dtw(mkts[[1]], mkts[[2]], window.type=sakoeChibaWindow, window.size=1)$distance / abs(sum(mkts[[2]]))

as opposed to:

dtw(mkts[[1]], mkts[[2]], window.type=sakoeChibaWindow, window.size=1)$distance / abs(sum(mkts[[1]]))

Apologies if I'm missing something obvious.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/klarsen1/MarketMatching/issues/18, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNKU5CPESBXANX6AHPVYOTRMS45TANCNFSM4MIAFXJA .

Jay-Down commented 4 years ago

Thanks for getting back to me. I did see that the repo hadn't been updated in a while and wasn't expecting such a speedy response!

Leaving markets_to_be_matched=NULL does give the conjugate distance - the other side of the distance matrix - but the value seems to be the inverse of what you'd expect based on the code.

I wasn't sure of the significance of the normalisation, but given that it's just to make the distances more manageable this is not an issue.

Thanks again.

klarsen1 commented 4 years ago

Yes, still very strange....

Kim

On Tue, Apr 14, 2020 at 1:59 PM Jay Down notifications@github.com wrote:

Thanks for getting back to me. I did see that the repo hadn't been updated in a while and wasn't expecting such a speedy response!

Leaving markets_to_be_matched=NULL does give the conjugate distance - the other side of the distance matrix - but the value seems to be the inverse of what you'd expect based on the code.

I wasn't sure of the significance of the normalisation, but given that it's just to make the distances more manageable this is not an issue.

Thanks again.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/klarsen1/MarketMatching/issues/18#issuecomment-613678551, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNKU5ARN4PAPTHZDLOZAOTRMTFEPANCNFSM4MIAFXJA .

klarsen1 commented 4 years ago

Hey --

if you re-download from GitHub and not CRAN, I added the raw distance and the normalization factor that is used to the best_matches data.frame.

Take a look and see if you still find the same issue.

K

On Tue, Apr 14, 2020 at 8:58 PM Kim Larsen kblarsen4@gmail.com wrote:

Yes, still very strange....

Kim

On Tue, Apr 14, 2020 at 1:59 PM Jay Down notifications@github.com wrote:

Thanks for getting back to me. I did see that the repo hadn't been updated in a while and wasn't expecting such a speedy response!

Leaving markets_to_be_matched=NULL does give the conjugate distance - the other side of the distance matrix - but the value seems to be the inverse of what you'd expect based on the code.

I wasn't sure of the significance of the normalisation, but given that it's just to make the distances more manageable this is not an issue.

Thanks again.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/klarsen1/MarketMatching/issues/18#issuecomment-613678551, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNKU5ARN4PAPTHZDLOZAOTRMTFEPANCNFSM4MIAFXJA .

klarsen1 commented 4 years ago

Also, I checked it out and it looks like the code works.

Here Seattle is "test" and the normalization constant is 11306:

[image: image.png]

which is also what I see in the data:

[image: image.png]

On Tue, Apr 14, 2020 at 10:02 PM Kim Larsen kblarsen4@gmail.com wrote:

Hey --

if you re-download from GitHub and not CRAN, I added the raw distance and the normalization factor that is used to the best_matches data.frame.

Take a look and see if you still find the same issue.

K

On Tue, Apr 14, 2020 at 8:58 PM Kim Larsen kblarsen4@gmail.com wrote:

Yes, still very strange....

Kim

On Tue, Apr 14, 2020 at 1:59 PM Jay Down notifications@github.com wrote:

Thanks for getting back to me. I did see that the repo hadn't been updated in a while and wasn't expecting such a speedy response!

Leaving markets_to_be_matched=NULL does give the conjugate distance - the other side of the distance matrix - but the value seems to be the inverse of what you'd expect based on the code.

I wasn't sure of the significance of the normalisation, but given that it's just to make the distances more manageable this is not an issue.

Thanks again.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/klarsen1/MarketMatching/issues/18#issuecomment-613678551, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNKU5ARN4PAPTHZDLOZAOTRMTFEPANCNFSM4MIAFXJA .