synthesized-io / community

https://synthesized.io
3 stars 0 forks source link

Children metas don't match the given indices/aliases/annotations [BUG] #5

Open aplotnikov2020 opened 1 year ago

aplotnikov2020 commented 1 year ago

Describe the bug MetaExtractor throws the error below when using a multi-column index:

Traceback (most recent call last):
  File "single_model_all_groups.py", line 19, in <module>
    df_meta = MetaExtractor.extract(df, id_index=["country", "platform"], time_index="day")
  File "synthesized/metadata/factory.py", line 333, in synthesized.metadata.factory.MetaExtractor.extract
  File "synthesized/metadata/factory.py", line 165, in synthesized.metadata.factory.MetaFactory.__call__
  File "synthesized/metadata/factory.py", line 204, in synthesized.metadata.factory.MetaFactory.create_meta
  File "synthesized/metadata/factory.py", line 260, in synthesized.metadata.factory.MetaFactory._from_df
  File "synthesized/metadata/data_frame_meta.py", line 46, in synthesized.metadata.data_frame_meta.DataFrameMeta.__init__
ValueError: Children metas (['day', 'country', 'platform', 'y']) don't match the given indices/aliases/annotations ([])

To Reproduce Steps to reproduce the behavior:

import pandas as pd
from datetime import datetime

from synthesized import MetaExtractor

data = [
    {'day': datetime(2022, 10, 1), 'country': 'NL', 'platform': 'android', 'y': 1.0},
    {'day': datetime(2022, 10, 2), 'country': 'NL', 'platform': 'android', 'y': 2.0},
    {'day': datetime(2022, 10, 3), 'country': 'NL', 'platform': 'android', 'y': 3.0},
    {'day': datetime(2022, 10, 4), 'country': 'NL', 'platform': 'android', 'y': 2.5},
    {'day': datetime(2022, 10, 5), 'country': 'NL', 'platform': 'android', 'y': 2.1},
    {'day': datetime(2022, 10, 6), 'country': 'NL', 'platform': 'android', 'y': 2.2},
    {'day': datetime(2022, 10, 1), 'country': 'ES', 'platform': 'ios', 'y': 10.0},
    {'day': datetime(2022, 10, 2), 'country': 'ES', 'platform': 'ios', 'y': 20.0},
    {'day': datetime(2022, 10, 3), 'country': 'ES', 'platform': 'ios', 'y': 30.0},
]

df = pd.DataFrame.from_records(data)

df_meta = MetaExtractor.extract(df, id_index=["country", "platform"], time_index="day")
df_series = df_meta.make_time_series(df)

Environment (please complete the following information):

Additional context:

aplotnikov2020 commented 1 year ago

At the moment the workaround is to pivot input DataFrame manually:

import pandas as pd
from datetime import datetime

from synthesized import MetaExtractor, HighDimSynthesizer
from synthesized.model import DataFrameModel

data = [
    {'day': datetime(2022, 10, 1), 'country': 'NL', 'platform': 'android', 'y': 1.0},
    {'day': datetime(2022, 10, 2), 'country': 'NL', 'platform': 'android', 'y': 2.0},
    {'day': datetime(2022, 10, 3), 'country': 'NL', 'platform': 'android', 'y': 3.0},
    {'day': datetime(2022, 10, 4), 'country': 'NL', 'platform': 'android', 'y': 2.5},
    {'day': datetime(2022, 10, 5), 'country': 'NL', 'platform': 'android', 'y': 2.1},
    {'day': datetime(2022, 10, 6), 'country': 'NL', 'platform': 'android', 'y': 2.2},
    {'day': datetime(2022, 10, 1), 'country': 'ES', 'platform': 'ios', 'y': 10.0},
    {'day': datetime(2022, 10, 2), 'country': 'ES', 'platform': 'ios', 'y': 20.0},
    {'day': datetime(2022, 10, 3), 'country': 'ES', 'platform': 'ios', 'y': 30.0},
]

df = pd.DataFrame.from_records(data)

pivoted_df = df.pivot(index='day', columns=['country', 'platform'], values='y')

pivoted_df.columns = ['_'.join(str(s).strip() for s in col if s) for col in pivoted_df.columns]
pivoted_df.reset_index(inplace=True)

df_meta = MetaExtractor.extract(pivoted_df)
DataFrameModel(df_meta).fit(pivoted_df)

synth = HighDimSynthesizer(df_meta)
synth.learn(df_train=pivoted_df)

df_synth = synth.synthesize(num_rows=len(pivoted_df))