ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.33k stars 599 forks source link

feat(postgres): more flexible map type implementation #10484

Open cpcloud opened 1 week ago

cpcloud commented 1 week ago

Discussed in https://github.com/ibis-project/ibis/discussions/10483

Originally posted by **augcollet** November 13, 2024 Hello, I need your help to resolve a specific problem... From the following data with postgresql backend : ```python import ibis from ibis import _ import os con = ibis.postgres.connect( user=os.getenv('POSTGRES_USER'), password=os.getenv('POSTGRES_PASSWORD'), host="postgres", port=os.getenv('POSTGRES_PORT'), database=os.getenv('POSTGRES_DB'), ) ibis.set_backend(con) t=ibis.memtable({ 'client_id':[0,1,0,2,3,0,1,2,3], 'product':['a', 'b', 'a', 'a', 'b', 'c', 'a', 'a', 'b'], 'amount':[1.2, 2.5, 4.2, 12.7, 1.2, 3.8, 1.4, 3.8, 3], }) ``` ![image](https://github.com/user-attachments/assets/cc33dafa-32b7-448f-a3b3-35a6f5263e4b) I'm trying to perform the following calculation : ![image](https://github.com/user-attachments/assets/998ead7b-a35d-409e-b1dc-804e7121aa8f) I tried the following approach: ```python data=( t.group_by(['client_id', 'product']) .agg( sum_amount=_['amount'].sum() ) .group_by(['client_id']) .agg( products_and_sum_amounts=ibis.map( _['product'].collect(), _['sum_amount'].collect() ) ) ) data.execute() ``` I get the following error : ![image](https://github.com/user-attachments/assets/03b3b9d2-2ccf-4563-888b-e542bdfb97b6) It seems that ibis uses hstore to store data from a .map, which is incompatible with numeric values. I have to cast the values ​​to a string before using .collect to get a result. ![image](https://github.com/user-attachments/assets/216e4848-a970-4a1e-b9fe-cd04ff5828b0) How can I get around this? For example, how can I build a JSON object instead of MapValue? ( My goal is to exploit the resulting pandas dataset to use it with a DictVectorizer under sklearn. https://scikit-learn.org/1.5/modules/generated/sklearn.feature_extraction.DictVectorizer.html ) Thank you in advance for your support!