Closed fura95 closed 1 year ago
Are there any examples how to get paquet file from hadoop?
Also used DeltaSource and has the same issue:
v_aggapp_card_source = DeltaSource(
name="v_aggapp_card_source",
path="/data/data_parquet/v_aggapp_card_source.parquet",
timestamp_field="event_timestamp",
)
@fura95 - I have to add implementation of get_table_column_names_and_types
which is used by feast apply
for schema inference if schema
is not specified for feature view. You can specify the schema as an workaround.
@qooba I specified schema for feature views, but that didnt fix the problem:
# Feature Views
v_aggapp_card_fv = FeatureView(
name="v_aggapp_card",
entities=[v_aggapp_card_entity],
ttl=timedelta(weeks=52),
schema=[
Field(name="cnt_mcc_br5_cat4_6", dtype=Int64),
],
source=v_aggapp_card_source,
tags={"test_tag": "cards"}
)
v_aggapp_credit_fv = FeatureView(
name="v_aggapp_credit",
entities=[v_aggapp_credit_entity],
ttl=timedelta(weeks=52),
schema=[
Field(name="loan_age_mortg_min", dtype=Int64),
Field(name="delinq_share_30p_ext_lifo", dtype=Int64),
Field(name="length_ext", dtype=Int64),
Field(name="max_util_card_act", dtype=Int64),
Field(name="pmt_delays_1_29_24m_sum_mnth_lifo", dtype=Int64),
],
source=v_aggapp_credit_source,
tags={"test_tag": "credits"}
)
@fura95 - I will try to reproduce. Can you send me feast version which you use ?
@qooba
feast==0.22.4
Try create FeatureView
without describing Entity
as a Field
in schema
:
my_entity = Entity(name="entity_id", description="entity id",)
mystats_view_parquet = FeatureView(
name="my_statistics_parquet",
entities=[my_entity],
ttl=timedelta(seconds=3600*24*20),
schema=[
#Field(name="entity_id", dtype=Float32),
Field(name="p0", dtype=Float32),
Field(name="p1", dtype=Float32),
Field(name="p2", dtype=Float32),
Field(name="p3", dtype=Float32),
Field(name="p4", dtype=Float32),
Field(name="p5", dtype=Float32),
Field(name="p6", dtype=Float32),
Field(name="p7", dtype=Float32),
Field(name="p8", dtype=Float32),
Field(name="p9", dtype=Float32),
Field(name="y", dtype=Float32),
], online=True, source=my_stats_parquet, tags={},)
@fura95 - thanks a lot for this hint :) now I'm able to reproduce. It seems that I have to implement get_table_column_names_and_types
for all sources. Temporary workaround would be specifying the whole schema (with entities).
Hi, i'm using ParquetSource to get the file from hadoop cluster:
When i execute
feast apply
or useapply_total(repo_config, repo, True)
I get the following error:My feature_store.yaml: