jmcarpenter2 / swifter

A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
MIT License
2.54k stars 101 forks source link

Progress Bar doesn't seem to be working #143

Closed santiarcar closed 2 years ago

santiarcar commented 4 years ago

Hi!

I've been using swifter for a while as I'm working on an ETL process where I need to handle huge dataframes.

I was used to seeing the progress bar when I used swifter.apply(), but it hasn't appeared for a while. I'm sharing code through a repository, but that shouldn't be a problem, should it?

Maybe the progress bar has been deprecated in later swifter versions?

I'm using swifter just like this with the latest version (1.0.7):

df = df.swifter.apply(lambda row: custom_function(row), axis=1)

jmcarpenter2 commented 3 years ago

Hi @santiarcar,

The progress bar is not implemented when performing axis=1 applies on dataframes containing strings. This is because these applies are implemented via Modin and there is not an easy way to leverage the progress bar for Modin dataframes, yet.

If you want to force swifter to use Dask in these instances, you can do 'df.swifter.allow_dask_on_strings().apply(...)'

Using Dask will enable the progress bar, but will also be somewhat less performant than the default Modin apply.

I hope that helps! Jason

edridgedsouza commented 3 years ago

May be of interest: https://github.com/modin-project/modin/pull/1589

This seems to make sense, but oddly, I see the progress bar for df's with < 100k rows, but for df's with > 100k rows I don't. Could this indicate that the engine isn't using Dask for smaller df's and is instead just applying a vanilla tqdm progress_apply()?

jmcarpenter2 commented 3 years ago

Hi @edridgedsouza yes that is the case. For small df's it does fall back to pandas. In the following link I show where it samples the dataframe apply to determine whether or not to use modin/pandas. https://github.com/jmcarpenter2/swifter/blob/master/swifter/swifter.py#L357

jmcarpenter2 commented 2 years ago

Modin was removed as part of the axis=1 applies and you should see progress bars for axis=1 string applies again :)