rj678 / pycausalmatch

Causal Impact of an intervention integrated with control group selection
MIT License
9 stars 3 forks source link

KeyError: 'Skip' #4

Closed sushanth-d closed 3 years ago

sushanth-d commented 3 years ago

Hey Rishi,

First of all Thank you for creating this module. Appreciate it very much.

Running into a KeyError while using it though:

Full logs:


KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897             try:
-> 2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Skip'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/pycausalmatch/main.py in best_matches(data, markets_to_be_matched, id_variable, date_variable, matching_variable, parallel, warping_limit, start_match_period, end_match_period, matches, dtw_emphasis, suggest_market_splits, split_bins)
    351             for i in range(len(markets_to_be_matched)):
    352                 dist_op = R_MarketMatching.calculate_distances(markets_to_be_matched, ip_df, id_variable,
--> 353                                               i, warping_limit, matches, dtw_emphasis
    354                                               )
    355 

~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/pycausalmatch/main.py in calculate_distances(markets_to_be_matched, ip_df, id_variable, i, warping_limit, matches, dtw_emphasis)
    227 
    228         # Filter down to only the top matches
--> 229         distances_df = distances_df[distances_df['Skip'] == False]
    230         distances_df['dist_rank'] = distances_df['RelativeDistance'].rank(method='first', ascending=True)
    231         distances_df['corr_rank'] = distances_df['Correlation'].rank(method='first', ascending=False)

~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2904             if self.columns.nlevels > 1:
   2905                 return self._getitem_multilevel(key)
-> 2906             indexer = self.columns.get_loc(key)
   2907             if is_integer(indexer):
   2908                 indexer = [indexer]

~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
-> 2900                 raise KeyError(key) from err
   2901 
   2902         if tolerance is not None:

KeyError: 'Skip'

Any idea why?

rj678 commented 3 years ago

Hi @sushanth-d - thanks for using this library and for opening this issue. sorry about the error.

could you share a sample of the dataset that you're using so I can reproduce the error.

are you able to run the examples in the notebooks folder?

I hope to get some time to be able to add the remaining TODOs incl adding some tests

sushanth-d commented 3 years ago

Hi @rishi1016 This error was caused due to corrupt data. Fixing the data part resolved it. Thanks!