Closed EddFigueiredo closed 10 months ago
Have you checked the matching variable in the post period?
Can I see the code you use to call the package?
K
On Tue, Oct 25, 2022 at 9:50 AM EddFigueiredo @.***> wrote:
Hello, I'm also having issues processing my dataset, it's really weird, cause I have the same dataset processing a different event, and it works fine.
The whole dataset time frame goes from 22/07/01 to 22/09/27 The intervention happened between 22/08/15 to 22/09/11 which leaves 16 days for the post period. The date column is parsed as date, so that's not the issue.
For my use case, this happened so far with this particular analysis. I have run other analysis using the same dataset but different combination of test/control markets and events.
Here follows the summary of the data present from the best_matches method:
id_var date_var match_var Length:10340 Min. :2022-07-01 Min. : 1.000 Class :character 1st Qu.:2022-07-21 1st Qu.: 1.000 Mode :character Median :2022-08-12 Median : 2.000 Mean :2022-08-11 Mean : 5.399 3rd Qu.:2022-09-03 3rd Qu.: 5.000 Max. :2022-09-27 Max. :145.000
I also checked the dataset itself, to see if there are missing dates or something, that's also not the case.
— Reply to this email directly, view it on GitHub https://github.com/klarsen1/MarketMatching/issues/26, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNKU5EIYOOZSCBHMI655G3WFAFWDANCNFSM6AAAAAAROFIIWY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Ok in summary, I have many test markets, and many events to process. I basically loop through the markets, and then loop through the events.
Inside this loop, this is how I'm calling the best_matches
method:
best_matches <- best_matches(
data = marketData,
id_variable= market_column,
date_variable= "date",
matching_variable= event,
markets_to_be_matched= current_market,
parallel= TRUE,
warping_limit= 1,
dtw_emphasis= 0,
matches= 10,
start_match_period= comparison_date,
end_match_period= end_match_period
)
Here's what each variable:
market_column = "market"
event = "order_count_winsorized_10thu_0thl"
current_market = "Los Angeles"
comparison_date = "2022-07-01"
end_match_period = "2022-08-14"
And here is the summary of marketData
:
date market order_count_winsorized_10thu_0thl
Min. :2022-07-01 Length:10340 Min. : 1.00
1st Qu.:2022-07-21 Class :character 1st Qu.: 1.00
Median :2022-08-12 Mode :character Median : 2.00
Mean :2022-08-11 Mean : 4.09
3rd Qu.:2022-09-03 3rd Qu.: 5.00
Max. :2022-09-27 Max. :14.00
marketData head:
date market order_count_winsorized_10thu_0thl
1 2022-07-01 Albany Schenectady Troy 6
2 2022-07-01 Albuquerque Santa Fe 3
3 2022-07-01 Atlanta 14
4 2022-07-01 Austin 14
5 2022-07-01 Bakersfield 1
6 2022-07-01 Baltimore 4
7 2022-07-01 Bangor 2
8 2022-07-01 Baton Rouge 2
9 2022-07-01 Birmingham Anniston Tuscaloosa 2
10 2022-07-01 Boise 3
11 2022-07-01 Boston Manchester 14
12 2022-07-01 Buffalo 2
13 2022-07-01 Burlington Plattsburgh 1
14 2022-07-01 Butte Bozeman 2
15 2022-07-01 Cedar Rapids Waterloo Dubuque 3
16 2022-07-01 Champaign Springfield Decatur 2
17 2022-07-01 Charleston 6
18 2022-07-01 Charleston Huntington 1
19 2022-07-01 Charlotte 12
20 2022-07-01 Charlottesville 2
21 2022-07-01 Chattanooga 1
22 2022-07-01 Cincinnati 6
23 2022-07-01 Cleveland Akron Canton 5
24 2022-07-01 Colorado Springs Pueblo 1
25 2022-07-01 Columbia 7
26 2022-07-01 Columbia Jefferson City 2
27 2022-07-01 Columbus GA 3
28 2022-07-01 Columbus, OH 8
29 2022-07-01 Corpus Christi 1
30 2022-07-01 Dallas Ft Worth 14
31 2022-07-01 Davenport Rock Island Moline 1
32 2022-07-01 Dayton 1
33 2022-07-01 Denver 14
34 2022-07-01 Des Moines Ames 5
35 2022-07-01 Detroit 14
36 2022-07-01 El Paso 1
Then I pass the output of best_matches
to inference
:
inference_results <- MarketMatching::inference(
matched_markets= best_matches,
test_market= c(current_market),
control_matches= 10
)
This is where the error comes.
The thing is best_matches
is not finding control market correlations for my test market:
summary(best_matches$BestMatches)
market BestControl RelativeDistance Correlation
Length:0 Length:0 Mode:logical Min. : NA
Class :character Class :character 1st Qu.: NA
Mode :character Mode :character Median : NA
Mean :NaN
3rd Qu.: NA
Max. : NA
Length SUMTEST SUMCNTL RAWDIST Correlation_of_logs
Min. : NA Min. : NA Min. : NA Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA Median : NA Median : NA Median : NA
Mean :NaN Mean :NaN Mean :NaN Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA Max. : NA Max. : NA Max. : NA
MatchingStartDate MatchingEndDate rank NORMDIST
Min. :NA Min. :NA Min. : NA Min. : NA
1st Qu.:NA 1st Qu.:NA 1st Qu.: NA 1st Qu.: NA
Median :NA Median :NA Median : NA Median : NA
Mean :NaN Mean :NaN Mean :NaN Mean :NaN
3rd Qu.:NA 3rd Qu.:NA 3rd Qu.: NA 3rd Qu.: NA
Max. :NA Max. :NA Max. : NA Max. : NA
Data is there though:
summary(best_matches$Data)
id_var date_var match_var
Length:10340 Min. :2022-07-01 Min. : 1.00
Class :character 1st Qu.:2022-07-21 1st Qu.: 1.00
Mode :character Median :2022-08-12 Median : 2.00
Mean :2022-08-11 Mean : 4.09
3rd Qu.:2022-09-03 3rd Qu.: 5.00
Max. :2022-09-27 Max. :14.00
Try the new version. devtools::install_github("klarsen1/MarketMatching")
Hello, I'm also having issues processing my dataset, it's really weird, cause I have the same dataset processing a different event, and it works fine.
The whole dataset time frame goes from
22/07/01
to22/09/27
The intervention happened between22/08/15
to22/09/11
which leaves 16 days for the post period. The date column is parsed as date, so that's not the issue.For my use case, this happened so far with this particular analysis. I have run other analysis using the same dataset but different combination of test/control markets and events.
Here follows the summary of the data present from the
best_matches
method:I also checked the dataset itself, to see if there are missing dates or something, that's also not the case.