perfectly-preserved-pie / larentals

An interactive map of for-sale & rental property listings in Los Angeles County, updated weekly.
https://wheretolive.la
15 stars 3 forks source link

Slow dataframe operations - use Dask instead of Pandas, dtype optimizations #210

Closed perfectly-preserved-pie closed 5 months ago

perfectly-preserved-pie commented 5 months ago

So Pandas isn't necessarily slow, but there has always been a noticeable lag when playing with the checkboxes, radio buttons, and sliders. Especially the Pets radio button and the rental/list price slider. I think it's getting even worse now that I've reached 4000+ rows in my dataframes.

I'm wondering if Dask might be a better option here. Especially because I have a fat ass 20c/40t CPU server that is literally doing nothing 99% of the time. I'm in a position where I could throw CPU horsepower at a problem until it's fixed.

I could also optimize the dtypes I'm using. For example, Bedrooms and Bathrooms don't need a full Int64 dtype - they could just as well use the int8 dtype which has a max of 100; it's unlikely a house is going to have more than 100 bedrooms or bathrooms.

perfectly-preserved-pie commented 5 months ago

Ugh, honestly with just 4k rows I don't think this could even be considered a "big" dataset where optimizations like this would actually matter.

The filters work almost instantly on localhost. The delay is probably coming from the fact that I gotta send a ~20MB or whatever JSON across the internet from the production website.

My efforts would probably be better utilized reducing that JSON payload size, not switching to Dask or whatever.