rj678 / pycausalmatch

Causal Impact of an intervention integrated with control group selection
MIT License
9 stars 3 forks source link

Generate 5 matches without specifying the markets_to_be_matched. #7

Open gabriele-franco opened 2 years ago

gabriele-franco commented 2 years ago

Hello, I'm trying to use this awesome library as I use MarketMatch in R with the following dataset:

matches = rmm.best_matches( data=df, id_variable=location, date_variable=date, markets_to_be_matched=["Lazio"], matching_variable=conversions, parallel=False, warping_limit=1, dtw_emphasis=1, matches=regions_matched, start_match_period="2021-01-01", end_match_period="2022-07-01", )

This script allows to me find the markets with the highest correlation with "Lazio".

Can I ask this function to find 5 automatic matches without specifying as we do in the function Market Match?:

mm <- best_matches(data=weather, id_variable="Area", date_variable="Date", matching_variable="Mean_TemperatureF", parallel=FALSE, warping_limit=1, # warping limit=1 dtw_emphasis=1, # rely only on dtw for pre-screening matches=5, # request 5 matches start_match_period="2014-01-01", end_match_period="2014-10-01") I want to create a script that automatically allows me to cluster a test and control group in my regions to simplify how I run Geo Experiments across my country.

rj678 commented 2 years ago

thank you for your kind comments @gabriele-franco - that hasn't been implemented and unfortunately, I'm unable to add functionality to the library at the moment, thank you again and good luck!

rj678 commented 2 years ago

Hi @gabriele-franco - sorry for the delay in getting back on this. Turns out this should already work - can you try this:

mm_only_cph = rmm.best_matches(data=weather_df, id_variable='Area', date_variable='Date',
                                matching_variable='Mean_TemperatureF', parallel=True,
                                warping_limit=1, dtw_emphasis=1,  matches=5,
                                start_match_period='2014-01-01', end_match_period='2014-10-01'
                                )

the variable markets_to_be_matched has not been specified - you should get 5 matches for every "Area"

I'm just getting back to developing this library again, and hope to have all of the functionality in the R package implemented soon. thank you

aazz7777 commented 6 months ago

quick question! so parallel processing also be done?

rj678 commented 6 months ago

not yet - if you set parallel =True, it will show the following message:

Parallel execution has not been added yet - executing sequentially