snowflakedb / snowpark-python

Snowflake Snowpark Python API
Apache License 2.0
242 stars 101 forks source link

SNOW-902662: DF.to_pandas_batches() batch size parameter #1027

Open ghost opened 10 months ago

ghost commented 10 months ago

Current behaviour

When using DataFrame.to_pandas_batches() it returns a Pandas Dataframe Iterator that generates Pandas Dataframes with a "random" number of rows.

Desired behaviour

I would like to insert a parameter in the to_pandas_batches() method, where I fix the number of rows for each Pandas Dataframe generated.

How would this improve snowflake-snowpark-python?

This would be beneficial since the user would be able to control the chunk sizes to process and be sure that his processes don't get overloaded by the snowflake back-end's calculation of the number of rows.

RahulDubey391 commented 7 months ago

Hi @MarcoFreitas0 , I would like to have a look on this issue!

stong1108 commented 3 months ago

I am also interested in this feature