sfu-db / connector-x

Fastest library to load data from DB to DataFrames in Rust and Python
https://sfu-db.github.io/connector-x
MIT License
2.02k stars 163 forks source link

Int dtypes are unnecessarily big #688

Open Happily-Coding opened 1 month ago

Happily-Coding commented 1 month ago

What language are you using?

Python

What version are you using?

0.3.3

What database are you using?

MySQL

What dataframe are you using?

Polars

Can you describe your bug?

Connectorx pointlessly transforms smaller db dtypes into int64, leading to higher memory usage. See mysql mappings or MSSQL mappings

What are the steps to reproduce the behavior?

Perform a query to mysql using connectorx to a table containing a smaller dtype in a column. The df will have int64 dtypes in all numeric dtypes, as specified by the incorrect transports linked above.

For example with polars which respectes the dtypes provided by connectorx:

import polars as pl
mysql_uri = f'{mysql_db_type}://{mysql_username}:{mysql_password}@{mysql_host}:{mysql_port}/{mysql_database_name}'
test_dataset = pl.read_database_uri(query, self.mysql_uri)

What is the error?

There will be no error message, just incorret dtypes.