klarsen1 / MarketMatching

Other
131 stars 37 forks source link

ERROR: no valid data in the post period #26

Closed EddFigueiredo closed 10 months ago

EddFigueiredo commented 2 years ago

Hello, I'm also having issues processing my dataset, it's really weird, cause I have the same dataset processing a different event, and it works fine.

The whole dataset time frame goes from 22/07/01 to 22/09/27 The intervention happened between 22/08/15 to 22/09/11 which leaves 16 days for the post period. The date column is parsed as date, so that's not the issue.

For my use case, this happened so far with this particular analysis. I have run other analysis using the same dataset but different combination of test/control markets and events.

Here follows the summary of the data present from the best_matches method:

id_var             date_var            match_var      
 Length:10340       Min.   :2022-07-01   Min.   :  1.000  
 Class :character   1st Qu.:2022-07-21   1st Qu.:  1.000  
 Mode  :character   Median :2022-08-12   Median :  2.000  
                    Mean   :2022-08-11   Mean   :  5.399  
                    3rd Qu.:2022-09-03   3rd Qu.:  5.000  
                    Max.   :2022-09-27   Max.   :145.000 

I also checked the dataset itself, to see if there are missing dates or something, that's also not the case.

klarsen1 commented 2 years ago

Have you checked the matching variable in the post period?

Can I see the code you use to call the package?

K

On Tue, Oct 25, 2022 at 9:50 AM EddFigueiredo @.***> wrote:

Hello, I'm also having issues processing my dataset, it's really weird, cause I have the same dataset processing a different event, and it works fine.

The whole dataset time frame goes from 22/07/01 to 22/09/27 The intervention happened between 22/08/15 to 22/09/11 which leaves 16 days for the post period. The date column is parsed as date, so that's not the issue.

For my use case, this happened so far with this particular analysis. I have run other analysis using the same dataset but different combination of test/control markets and events.

Here follows the summary of the data present from the best_matches method:

id_var date_var match_var Length:10340 Min. :2022-07-01 Min. : 1.000 Class :character 1st Qu.:2022-07-21 1st Qu.: 1.000 Mode :character Median :2022-08-12 Median : 2.000 Mean :2022-08-11 Mean : 5.399 3rd Qu.:2022-09-03 3rd Qu.: 5.000 Max. :2022-09-27 Max. :145.000

I also checked the dataset itself, to see if there are missing dates or something, that's also not the case.

— Reply to this email directly, view it on GitHub https://github.com/klarsen1/MarketMatching/issues/26, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNKU5EIYOOZSCBHMI655G3WFAFWDANCNFSM6AAAAAAROFIIWY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

EddFigueiredo commented 2 years ago

Ok in summary, I have many test markets, and many events to process. I basically loop through the markets, and then loop through the events.

Inside this loop, this is how I'm calling the best_matches method:

best_matches <- best_matches(
            data = marketData,
            id_variable= market_column,
            date_variable= "date",
            matching_variable= event,
            markets_to_be_matched= current_market,
            parallel= TRUE,
            warping_limit= 1,
            dtw_emphasis= 0,
            matches= 10,
            start_match_period= comparison_date,
            end_match_period= end_match_period
          )

Here's what each variable:

 market_column = "market"
 event = "order_count_winsorized_10thu_0thl"
 current_market = "Los Angeles"
 comparison_date = "2022-07-01"
 end_match_period = "2022-08-14"

And here is the summary of marketData:

date               market          order_count_winsorized_10thu_0thl
 Min.   :2022-07-01   Length:10340       Min.   : 1.00                    
 1st Qu.:2022-07-21   Class :character   1st Qu.: 1.00                    
 Median :2022-08-12   Mode  :character   Median : 2.00                    
 Mean   :2022-08-11                      Mean   : 4.09                    
 3rd Qu.:2022-09-03                      3rd Qu.: 5.00                    
 Max.   :2022-09-27                      Max.   :14.00

marketData head:

         date                         market order_count_winsorized_10thu_0thl
1  2022-07-01        Albany Schenectady Troy                                 6
2  2022-07-01           Albuquerque Santa Fe                                 3
3  2022-07-01                        Atlanta                                14
4  2022-07-01                         Austin                                14
5  2022-07-01                    Bakersfield                                 1
6  2022-07-01                      Baltimore                                 4
7  2022-07-01                         Bangor                                 2
8  2022-07-01                    Baton Rouge                                 2
9  2022-07-01 Birmingham Anniston Tuscaloosa                                 2
10 2022-07-01                          Boise                                 3
11 2022-07-01              Boston Manchester                                14
12 2022-07-01                        Buffalo                                 2
13 2022-07-01         Burlington Plattsburgh                                 1
14 2022-07-01                  Butte Bozeman                                 2
15 2022-07-01  Cedar Rapids Waterloo Dubuque                                 3
16 2022-07-01  Champaign Springfield Decatur                                 2
17 2022-07-01                     Charleston                                 6
18 2022-07-01          Charleston Huntington                                 1
19 2022-07-01                      Charlotte                                12
20 2022-07-01                Charlottesville                                 2
21 2022-07-01                    Chattanooga                                 1
22 2022-07-01                     Cincinnati                                 6
23 2022-07-01         Cleveland Akron Canton                                 5
24 2022-07-01        Colorado Springs Pueblo                                 1
25 2022-07-01                       Columbia                                 7
26 2022-07-01        Columbia Jefferson City                                 2
27 2022-07-01                    Columbus GA                                 3
28 2022-07-01                   Columbus, OH                                 8
29 2022-07-01                 Corpus Christi                                 1
30 2022-07-01                Dallas Ft Worth                                14
31 2022-07-01   Davenport Rock Island Moline                                 1
32 2022-07-01                         Dayton                                 1
33 2022-07-01                         Denver                                14
34 2022-07-01                Des Moines Ames                                 5
35 2022-07-01                        Detroit                                14
36 2022-07-01                        El Paso                                 1

Then I pass the output of best_matches to inference:

inference_results <- MarketMatching::inference(
            matched_markets= best_matches,
            test_market= c(current_market),
            control_matches= 10
          )

This is where the error comes.

The thing is best_matches is not finding control market correlations for my test market:

summary(best_matches$BestMatches)
    market          BestControl        RelativeDistance  Correlation 
 Length:0           Length:0           Mode:logical     Min.   : NA  
 Class :character   Class :character                    1st Qu.: NA  
 Mode  :character   Mode  :character                    Median : NA  
                                                        Mean   :NaN  
                                                        3rd Qu.: NA  
                                                        Max.   : NA  
     Length       SUMTEST       SUMCNTL       RAWDIST    Correlation_of_logs
 Min.   : NA   Min.   : NA   Min.   : NA   Min.   : NA   Min.   : NA        
 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA   1st Qu.: NA   1st Qu.: NA        
 Median : NA   Median : NA   Median : NA   Median : NA   Median : NA        
 Mean   :NaN   Mean   :NaN   Mean   :NaN   Mean   :NaN   Mean   :NaN        
 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA        
 Max.   : NA   Max.   : NA   Max.   : NA   Max.   : NA   Max.   : NA        
 MatchingStartDate MatchingEndDate      rank        NORMDIST  
 Min.   :NA        Min.   :NA      Min.   : NA   Min.   : NA  
 1st Qu.:NA        1st Qu.:NA      1st Qu.: NA   1st Qu.: NA  
 Median :NA        Median :NA      Median : NA   Median : NA  
 Mean   :NaN       Mean   :NaN     Mean   :NaN   Mean   :NaN  
 3rd Qu.:NA        3rd Qu.:NA      3rd Qu.: NA   3rd Qu.: NA  
 Max.   :NA        Max.   :NA      Max.   : NA   Max.   : NA 

Data is there though:

summary(best_matches$Data)
    id_var             date_var            match_var    
 Length:10340       Min.   :2022-07-01   Min.   : 1.00  
 Class :character   1st Qu.:2022-07-21   1st Qu.: 1.00  
 Mode  :character   Median :2022-08-12   Median : 2.00  
                    Mean   :2022-08-11   Mean   : 4.09  
                    3rd Qu.:2022-09-03   3rd Qu.: 5.00  
                    Max.   :2022-09-27   Max.   :14.00
klarsen1 commented 10 months ago

Try the new version. devtools::install_github("klarsen1/MarketMatching")