vincev / dply-rs

A data manipulation tool for parquet, csv, and json data.
Apache License 2.0
42 stars 3 forks source link

macos 14.5 performance issue #65

Closed maxim-uvarov closed 5 months ago

maxim-uvarov commented 5 months ago

Hi! I was testing the polars plugin for nushell, and found that it became super slow after upgrading to macOS 14.5. I checked the same benchmarks with dply and found that it also runs way too slow! Also, some other person reproduced slowliness on macos 14.5, and, I tested the polars nushell plugin under macos 14.2 and it runs quickly there.

Dear @vincev, I'm wondering if you have a macOS to test nearby and if you have any thoughts on the problem below.


I downloaded and unpacked this dataset.

I executed this command in the terminal:

time dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()'
shape: (21, 3)
┌──────┬────────┬──────────┐
│ year ┆ count  ┆ sum      │
│ ---  ┆ ---    ┆ ---      │
│ i64  ┆ u32    ┆ i64      │
╞══════╪════════╪══════════╡
│ 2000 ┆ 240730 ┆ 9107793  │
...
│ 2020 ┆ 273265 ┆ 13517976 │
└──────┴────────┴──────────┘
dplyr -c   0.50s user 14.57s system 297% cpu 5.073 total

So we can see that it took 5 seconds, while it should run in 0.5s (based on my intuitions).

Plus, what I found is that timings are really unstable - sometimes much lower, sometimes much higher.

Though without sum = sum(geo_count) timings are like normal:

> time dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n()) | show()'

│ 2020 ┆ 273265 │
└──────┴────────┘
dply -c   0.43s user 0.08s system 467% cpu 0.110 total

P.S. I also ran the python script, and it goes with no problem whatsoever - very fast as usual:

import polars as pl

df = pl.read_csv("Data7602DescendingYearOrder.csv")
res = df.groupby("year").agg(pl.col("geo_count").sum())
print(res)
maxim-uvarov commented 5 months ago

Before asking here, I checked the issues in the Polars repository, and it seems like nobody is aware of such problems even after 5 days since the release of the update.

vincev commented 5 months ago

Hi @maxim-uvarov I have released a new version the build on my box using your dataset takes 150ms to run your query:

time dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' 
shape: (21, 3)
┌──────┬────────┬──────────┐
│ year ┆ count  ┆ sum      │
│ ---  ┆ ---    ┆ ---      │
│ i64  ┆ u32    ┆ i64      │
╞══════╪════════╪══════════╡
│ 2000 ┆ 240730 ┆ 9107793  │
│ 2001 ┆ 239301 ┆ 9034416  │
│ 2002 ┆ 240243 ┆ 9128109  │
│ 2003 ┆ 243470 ┆ 9458073  │
│ 2004 ┆ 252073 ┆ 10273596 │
│ …    ┆ …      ┆ …        │
│ 2016 ┆ 266958 ┆ 12551199 │
│ 2017 ┆ 268989 ┆ 12894723 │
│ 2018 ┆ 270072 ┆ 13031973 │
│ 2019 ┆ 272229 ┆ 13297923 │
│ 2020 ┆ 273265 ┆ 13517976 │
└──────┴────────┴──────────┘
dply -c   0.62s user 0.09s system 471% cpu 0.151 total

Give it a try let me know how it goes.

maxim-uvarov commented 5 months ago

@vincev Thank you for such a swift release!

I installed the new version, but I haven't noticed any changes. I'm wondering what your performance was like before the new release. Did you experience the same slowness?


> time dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()'
shape: (21, 3)
┌──────┬────────┬──────────┐
│ year ┆ count  ┆ sum      │
...
│ 2020 ┆ 273265 ┆ 13517976 │
└──────┴────────┴──────────┘
dply -c   0.48s user 25.10s system 365% cpu 7.001 total

> dply -V
dply 0.3.1
vincev commented 5 months ago

@maxim-uvarov I have tried version 0.3.0 that used polars 0.38.3 and the performance is about the same:

$ sw_vers
ProductName:            macOS
ProductVersion:         14.5
BuildVersion:           23F79
$ dply --version
dply 0.3.0
$ time  dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()'          
shape: (21, 3)
┌──────┬────────┬──────────┐
│ year ┆ count  ┆ sum      │
│ ---  ┆ ---    ┆ ---      │
│ i64  ┆ u32    ┆ i64      │
╞══════╪════════╪══════════╡
│ 2000 ┆ 240730 ┆ 9107793  │
│ 2001 ┆ 239301 ┆ 9034416  │
│ 2002 ┆ 240243 ┆ 9128109  │
│ 2003 ┆ 243470 ┆ 9458073  │
│ 2004 ┆ 252073 ┆ 10273596 │
│ …    ┆ …      ┆ …        │
│ 2016 ┆ 266958 ┆ 12551199 │
│ 2017 ┆ 268989 ┆ 12894723 │
│ 2018 ┆ 270072 ┆ 13031973 │
│ 2019 ┆ 272229 ┆ 13297923 │
│ 2020 ┆ 273265 ┆ 13517976 │
└──────┴────────┴──────────┘
dply -c   0.65s user 0.09s system 511% cpu 0.144 total

so it looks the drop in performance you have observed is not related to Polars.

One thing I have noticed from your timings is that the system time is very high could it be that you are low on memory?

Maybe you can try using /usr/bin/time -l that gives more info, this is what I get:

/usr/bin/time -l dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()'
shape: (21, 3)
┌──────┬────────┬──────────┐
│ year ┆ count  ┆ sum      │
│ ---  ┆ ---    ┆ ---      │
│ i64  ┆ u32    ┆ i64      │
╞══════╪════════╪══════════╡
│ 2000 ┆ 240730 ┆ 9107793  │
│ 2001 ┆ 239301 ┆ 9034416  │
│ 2002 ┆ 240243 ┆ 9128109  │
│ 2003 ┆ 243470 ┆ 9458073  │
│ 2004 ┆ 252073 ┆ 10273596 │
│ …    ┆ …      ┆ …        │
│ 2016 ┆ 266958 ┆ 12551199 │
│ 2017 ┆ 268989 ┆ 12894723 │
│ 2018 ┆ 270072 ┆ 13031973 │
│ 2019 ┆ 272229 ┆ 13297923 │
│ 2020 ┆ 273265 ┆ 13517976 │
└──────┴────────┴──────────┘
        0.14 real         0.62 user         0.08 sys
           473284608  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               39063  page reclaims
                  29  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                 750  involuntary context switches
          6084745642  instructions retired
          1553002484  cycles elapsed
           384780992  peak memory footprint
maxim-uvarov commented 5 months ago

@vincev, yep, seems like something is messing with dply on my system. I have 32gb of ram, and never had problem because of it.

I have 18888 involuntary context switches versus yours 750 involuntary context switches

I'll try to debug it today. I will keep you informed about my findings.

@vincev Thank you so much for your attention!

/usr/bin/time -l dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()'
shape: (21, 3)
┌──────┬────────┬──────────┐
│ year ┆ count  ┆ sum      │
│ ---  ┆ ---    ┆ ---      │
│ i64  ┆ u32    ┆ i64      │
╞══════╪════════╪══════════╡
│ 2000 ┆ 240730 ┆ 9107793  │
│ 2001 ┆ 239301 ┆ 9034416  │
│ 2002 ┆ 240243 ┆ 9128109  │
│ 2003 ┆ 243470 ┆ 9458073  │
│ 2004 ┆ 252073 ┆ 10273596 │
│ …    ┆ …      ┆ …        │
│ 2016 ┆ 266958 ┆ 12551199 │
│ 2017 ┆ 268989 ┆ 12894723 │
│ 2018 ┆ 270072 ┆ 13031973 │
│ 2019 ┆ 272229 ┆ 13297923 │
│ 2020 ┆ 273265 ┆ 13517976 │
└──────┴────────┴──────────┘
       27.91 real         0.43 user       168.87 sys
           523173888  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               39232  page reclaims
                 515  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
               18888  involuntary context switches
        108996644731  instructions retired
        514845525269  cycles elapsed
           370100992  peak memory footprint
maxim-uvarov commented 5 months ago

I installed a VM with macOS 14.5 on a Mac with macOS 14.5. Inside the virtual machine, dply runs very fast. So I can conclude that the problem is not with 14.5, but with my host operating system.

Also, I disabled my login applications, restarted, but no luck. I started my macOS in safe mode, and again, no luck.

vincev commented 5 months ago

This is very strange, one thing you could try to see if it is disk related is to convert the data to parquet and see if there is any improvement:

dply -c 'csv("./Data7602DescendingYearOrder.csv") | parquet("data.parquet")'

or you could run a profiler like instruments to see what is taking all the time but that is a lot of work.

maxim-uvarov commented 5 months ago

@vincev 👏👏👏 indeed! converting to parquet helped! How did you know that??

> 1..100 | each {timeit {dply -c 'parquet("./data.parquet") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬──────────────────╮
│ max    │ 73ms 401µs 917ns │
│ min    │ 61ms 454µs 209ns │
│ median │ 65ms 562µs 166ns │
╰────────┴──────────────────╯
> 1..10 | each {timeit {dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬─────────────────────────╮
│ max    │ 23sec 630ms 989µs 542ns │
│ min    │ 1sec 57ms 889µs 667ns   │
│ median │ 3sec 658ms 593µs 437ns  │
╰────────┴─────────────────────────╯
maxim-uvarov commented 5 months ago

So the bug might be in csv reader, right? I have been working with 250MB CSV files using dataframes in nushell without any issues for a long time. If they were to become slow, I would have noticed that.

So I'm guessing, that something might have changed in polars csv reader.

@vincev, what do you think? or maybe you have another ideas?

fdncred commented 5 months ago

These fluctuations in the csv file are a bit weird. Here's two tests with parquet and two with csv on my MBP M1Max 32gb ram 1TB ssd.

Parquet

❯ 1..100 | each {timeit {polars open data.parquet | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect| null}} |   | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬──────────────────╮
│ max    │ 43ms 129µs 709ns │
│ min    │ 20ms 727µs 166ns │
│ median │ 21ms 661µs 562ns │
╰────────┴──────────────────╯
❯ 1..100 | each {timeit {polars open data.parquet | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect| null}} |   | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬──────────────────╮
│ max    │ 37ms 550µs 542ns │
│ min    │ 20ms 334µs 292ns │
│ median │ 21ms 293µs 896ns │
╰────────┴──────────────────╯

CSV

❯ 1..10 | each {timeit {dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬────────────────────────╮
│ max    │ 21sec 75ms 635µs 958ns │
│ min    │ 538ms 485µs 291ns      │
│ median │ 5sec 547ms 601µs 520ns │
╰────────┴────────────────────────╯
❯ 1..10 | each {timeit {dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬──────────────────────────────╮
│ max    │ 1min 16sec 684ms 528µs 250ns │
│ min    │ 553ms 545µs 500ns            │
│ median │ 1sec 784ms 562µs 375ns       │
╰────────┴──────────────────────────────╯
maxim-uvarov commented 5 months ago

I have a 4tb hard drive, and lot's of small text files on it. Maybe some indexing process, that was initiated after upgrade and haven't finished yet is misbehaving.

@fdncred do you have any deviations from average users in terms of hard drive size or the number of files and text files?

fdncred commented 5 months ago

not really. i don't keep a lot of small text files around on purpose, other than rust code in repos.

vincev commented 5 months ago

converting to parquet helped! How did you know that??

@maxim-uvarov I didn't really know it was just a guess, when the system time is very high it may be that the process is doing lots IO (networking/disk) so trying a file format that is more efficient may help, the parquet file is 9x smaller with the data encoded in binary.

Regarding the Polars CSV reader I have tried to run the tests you have runs above on my box and I get consistent timings:

> 1..100 | each {timeit {dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬───────────────────╮
│ max    │ 153ms 753µs 250ns │
│ min    │ 112ms 824µs 459ns │
│ median │ 117ms 658µs 729ns │
╰────────┴───────────────────╯

and when you run it on the VM you got good results too so it looks the high variation in timings may be due to some external factors but is hard to say without spending time profiling.

@fdncred the polars plugin looks very nice.

fdncred commented 5 months ago

@fdncred the polars plugin looks very nice.

Thanks. We're working on it but we're seeing some inconsistencies, that's when we decided to try dply since it's somewhat related. We wanted to see if we saw the same similarities, specifically with csv files. I'm still not sure what's going on specifically with macos.

These timings are with an old desktop system in Windows and seem somewhat reasonable with parquet and csv.

dply with csv

❯ 1..100 | each {timeit {dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬───────────────────╮
│ max    │ 388ms 375µs 700ns │
│ min    │ 264ms 181µs 400ns │
│ median │ 286ms 849µs 200ns │
╰────────┴───────────────────╯

dply with parquet

❯ 1..100 | each {timeit {dply -c 'parquet("./data.parquet") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬───────────────────╮
│ max    │ 315ms 194µs 500ns │
│ min    │ 175ms 592µs 100ns │
│ median │ 193ms 747µs 300ns │
╰────────┴───────────────────╯

nushell with csv

❯ 1..100 | each {timeit {polars open Data7602DescendingYearOrder.csv | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect| null}} |   | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬───────────────────╮
│ max    │ 221ms 192µs 800ns │
│ min    │ 147ms 270µs 200ns │
│ median │ 163ms 65µs 750ns  │
╰────────┴───────────────────╯

nushell with parquet (the max and min are pretty far apart here for some reason)

❯ 1..100 | each {timeit {polars open data.parquet | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect| null}} |   | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬───────────────────╮
│ max    │ 491ms 423µs 200ns │
│ min    │ 75ms 194µs        │
│ median │ 91ms 373µs 200ns  │
╰────────┴───────────────────╯
maxim-uvarov commented 5 months ago

@vincev Also, I've noticed for sure that my battery is draining while the Mac is idle. So perhaps there is lots of IO happening. And I really believe that it is related to the 14.5 update.

vincev commented 5 months ago

@maxim-uvarov another test you can do is try with duckdb that uses a completely different code base and language and see if you notice the same variance:

1..100 | each {timeit {duckdb -c "select year, count(*) n, sum(geo_count) s from './Data7602DescendingYearOrder.csv' group by year order by 1" | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬───────────────────╮
│ max    │ 266ms 566µs 625ns │
│ min    │ 132ms 364µs 958ns │
│ median │ 138ms 10µs 667ns  │
╰────────┴───────────────────╯

Maybe you can use Activity Monitor to see if it shows processes doing lots of IO following the upgrade.

maxim-uvarov commented 5 months ago

@vincev Many thanks for keeping us in mind!

No problem with duckdb, as no problem with Python Polars. Everything runs very smoothly.

> 1..100 | each {timeit {duckdb -c "select year, count(*) n, sum(geo_count) s from './Data7602DescendingYearOrder.csv' group by year order by 1" | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
∙
╭────────┬───────────────────╮
│ max    │ 119ms 656µs 708ns │
│ min    │ 108ms 837µs 41ns  │
│ median │ 113ms 817µs 229ns │
╰────────┴───────────────────╯
maxim-uvarov commented 5 months ago

I'm going to learn how to use the Instruments Mac profiler for my case. But before that, I want to complete some other computational tasks that are occupying my mind.

maxim-uvarov commented 5 months ago

polars-cli is also slow with csv. P.S. I renamed polars to polars-cli as this name is already used in nushell for the plugin

> timeit {polars-cli -c "select year, count(*) n, sum(geo_count) s from read_csv('Data7602DescendingYearOrder.csv') group by year;"}
┌──────┬────────┬──────────┐
│ year ┆ n      ┆ s        │
│ ---  ┆ ---    ┆ ---      │
│ i64  ┆ u32    ┆ i64      │
╞══════╪════════╪══════════╡
│ 2007 ┆ 259065 ┆ 11348772 │
│ 2005 ┆ 255917 ┆ 10724928 │
│ 2010 ┆ 259891 ┆ 11513931 │
│ 2011 ┆ 260366 ┆ 11523552 │
│ 2015 ┆ 265129 ┆ 12305010 │
│ …    ┆ …      ┆ …        │
│ 2014 ┆ 262630 ┆ 12004401 │
│ 2012 ┆ 260584 ┆ 11510928 │
│ 2003 ┆ 243470 ┆ 9458073  │
│ 2013 ┆ 260208 ┆ 11588178 │
│ 2001 ┆ 239301 ┆ 9034416  │
└──────┴────────┴──────────┘
13sec 69ms 280µs 83ns
vincev commented 5 months ago

@maxim-uvarov that's interesting, the Polars team just released version 0.40, I have update dply dependencies on main, I am not sure if it solves your problem maybe you can give it a try if it doesn't probably the best thing is to open an issue with the Polars team that shows the difference between duckdb and polars-cli.

maxim-uvarov commented 5 months ago

@vincev I updated to dply 0.3.2, yet it didn't help :( Thanks for the suggestion. I will create an issue in the polars-cli repository. But before that, I want to try the instruments to gather more details. It seems like a handy skill and an interesting activity.

I just found this: https://github.com/cmyr/cargo-instruments?tab=readme-ov-file

maxim-uvarov commented 5 months ago

I ran this

sudo flamegraph ...[-o my_flamegraph2.svg] -- dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()'

and got this:

my_fl1

vincev commented 5 months ago

I run it on my box, I rebuilt dply using:

[profile.release]
debug = true

I got the following:

my_flamegraph2

It looks that in your case is spending much more time doing memory stuff but I am not sure what is causing that.

maxim-uvarov commented 5 months ago

I also built it with

[profile.release]
debug = true

here is the picture.

dply-csv-with-debug-true

The timing variance is quite big:

> 1..10 | each {timeit {dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬────────────────────────╮
│ max    │ 8sec 971ms 429µs 833ns │
│ min    │ 153ms 956µs 42ns       │
│ median │ 3sec 9ms 338µs 500ns   │
╰────────┴────────────────────────╯
vincev commented 5 months ago

It looks like is spending lots of time doing memory stuff, I don't really know what is causing that, maybe ask on the Polars discussions someone may have had experienced the same problem, it may be some MacOS configuration setting that has changed after the upgrade on your box.

maxim-uvarov commented 5 months ago

Yesterday I collected 'sys' template information and loaded it into instruments. The file is 120 MB and full of data. Though it's really difficult to understand where to look for insights, I don't give up: I chat with GPT and compare profilings from loading Parquet.

Also, I contacted Apple support but was rather disappointed by their answers. I sent the issue via Feedback Assistant, yet I don't hope that they will investigate them properly.

Today im going to ask at polars git or in their chat

devyn commented 5 months ago

I just got my hands on a used min-spec Mac Mini M1 (8 GB RAM, 256 GB SSD). It's still running Sonoma 14.4 and I haven't upgraded it.

I will hold off on upgrading in case there is any more repro to be done. But so far it seems like 14.4 probably doesn't have this problem:

dply with CSV:

~/polars-slow-repro> 1..10 | each {timeit {dply -c 'csv("./Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬───────────────────╮
│ max    │ 141ms 195µs 208ns │
│ min    │ 110ms 584µs 334ns │
│ median │ 113ms 445µs 791ns │
╰────────┴───────────────────╯

dply with parquet:

~/polars-slow-repro> 1..100 | each {timeit {dply -c 'parquet("./data.parquet") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬───────────────────╮
│ max    │ 113ms 848µs 542ns │
│ min    │ 67ms 554µs 584ns  │
│ median │ 71ms 265µs 938ns  │
╰────────┴───────────────────╯

Nushell with CSV:

~/polars-slow-repro> 1..100 | each {timeit {polars open Data7602DescendingYearOrder.csv | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect| null}} |   | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬──────────────────╮
│ max    │ 80ms 650µs 500ns │
│ min    │ 58ms 476µs 458ns │
│ median │ 61ms 510µs 500ns │
╰────────┴──────────────────╯

Nushell with Parquet:

~/polars-slow-repro> 1..100 | each {timeit {polars open data.parquet | polars group-by year | polars agg (polars col geo_count | polars sum) | polars collect| null}} |   | {max: ($in | math max), min: ($in | math min), median: ($in | math median)}
╭────────┬──────────────────╮
│ max    │ 48ms 886µs 584ns │
│ min    │ 24ms 792µs 459ns │
│ median │ 26ms 442µs 521ns │
╰────────┴──────────────────╯
devyn commented 5 months ago

@vincev Question for you: are you on Apple Silicon or Intel?

My pet hypothesis at the moment is that this has something to do with GoFetch vulnerability mitigations, but I'm not sure. The 'involuntary context switches' kind of lines up with that idea too.

maxim-uvarov commented 5 months ago

We discussed with Devyn that before upgrading, it might be useful to use Xcode instruments to obtain profiles of a system before it gets upgraded, which might cause the problem.

Here is a quick instruction to get started:

To get started, you need to install Xcode (it can be done via the App Store, but the download was very slow for me):

https://download.developer.apple.com/Developer_Tools/Xcode_15.4/Xcode_15.4.xip

In Xcode, choose:

image

After installation, if it refuses to work, you can try these commands (that found in my history, while was troubleshooting):

/usr/bin/xcodebuild -version
xcrun xctrace

Then install Cargo Instruments:

https://github.com/cmyr/cargo-instruments

In your clone directory of dply, add this to Cargo.toml:

[profile.release]
debug = true

After that, from that folder, you can execute commands like:

cargo instruments -t 'Activity Monitor' -- dply -c 'csv("/Users/user/git/nushell.github.io/book/test/Data7602DescendingYearOrder.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()'
> cargo instruments -l
Xcode Instruments templates:

built-in            abbrev
--------------------------
Activity Monitor
Allocations         (alloc)
Animation Hitches
App Launch
Audio System Trace
CPU Counters
CPU Profiler
Core ML
Data Persistence
File Activity       (io)
Game Memory
Game Performance
Leaks
Logging
Metal System Trace
Network
RealityKit Trace
SceneKit
Swift Concurrency
SwiftUI
System Trace        (sys)
Tailspin
Time Profiler       (time)
Zombies

I found those templates interesting:

Activity Monitor
io
sys
maxim-uvarov commented 5 months ago

I can share my profiles. The largest one is over 100 MB. Apple says there might be some sensitive information to share it with the public, but I have nothing to hide. Therefore, I can easily share them with the participants of this conversation if they are willing.

maxim-uvarov commented 5 months ago

I converted all the data to JSON. Reading from a JSON file five times bigger than CSV takes 100 times less time.

> open Data7602DescendingYearOrder.csv | each {to json -r} | str join (char nl) | save data.json -f

> 1..50 | each {timeit {dply -c 'json("./data.json") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {median: ($in | math median), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in).ns' | into duration --unit sec)}
╭────────┬───────────────────╮
│ median │ 662ms 454µs 521ns │
│ min    │ 540ms 567µs 916ns │
│ max    │ 841ms 515µs 291ns │
│ stddev │ 69ms 853µs 152ns  │
╰────────┴───────────────────╯

> 1..10 | each {timeit {dply -c 'csv("./data.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {median: ($in | math median), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in).ns' | into duration --unit sec)}
╭────────┬─────────────────────────╮
│ median │ 7sec 317ms 54µs 541ns   │
│ min    │ 226ms 195µs 708ns       │
│ max    │ 21sec 938ms 538µs 750ns │
│ stddev │ 5sec 845ms 369µs 36ns   │
╰────────┴─────────────────────────╯
maxim-uvarov commented 5 months ago

though! If I remove the calculation of sum everything flies even with csv!

> 1..10 | each {timeit {dply -c 'csv("./data.csv") | group_by(year) | summarize(count = n()) | show()' | complete | null}} | {median: ($in | math median), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in).ns' | into duration --unit sec)}
╭────────┬──────────────────╮
│ median │ 91ms 958µs 833ns │
│ min    │ 89ms 849µs 917ns │
│ max    │ 97ms 538µs 292ns │
│ stddev │ 2ms 284µs 986ns  │
╰────────┴──────────────────╯

I guess I can compare profiles with sum operation and without.

fdncred commented 5 months ago

With sum

❯ 1..50 | each {timeit {dply -c 'csv("./data.csv") | group_by(year) | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {median: ($in | math median), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in).ns' | into duration --unit sec)}
╭────────┬────────────────────────╮
│ median │ 1sec 337ms 902µs 979ns │
│ min    │ 220ms 778µs 209ns      │
│ max    │ 17sec 860ms 1µs 584ns  │
│ stddev │ 2sec 662ms 457µs 129ns │
╰────────┴────────────────────────╯

Without sum

❯ 1..50 | each {timeit {dply -c 'csv("./data.csv") | group_by(year) | summarize(count = n()) | show()' | complete | null}} | {median: ($in | math median), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in).ns' | into duration --unit sec)}
╭────────┬───────────────────────╮
│ median │ 94ms 813µs 416ns      │
│ min    │ 89ms 422µs 42ns       │
│ max    │ 1sec 12ms 978µs 375ns │
│ stddev │ 174ms 418µs 404ns     │
╰────────┴───────────────────────╯
maxim-uvarov commented 5 months ago

polars cli doesn't like sum either:

> 1..10 | each {timeit {polars-cli -c "select year, count(*) n from read_csv('Data7602DescendingYearOrder.csv') group by year;" | complete | null}} | {median: ($in | math median), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in).ns' | into duration --unit sec)}
╭────────┬──────────────────╮
│ median │ 45ms 243µs 354ns │
│ min    │ 43ms 239µs 875ns │
│ max    │ 49ms 414µs 167ns │
│ stddev │ 2ms 12µs 745ns   │
╰────────┴──────────────────╯

> 1..10 | each {timeit {polars-cli -c "select year, count(*) n, sum(geo_count) s from read_csv('Data7602DescendingYearOrder.csv') group by year;" | complete | null}} | {median: ($in | math median), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in).ns' | into duration --unit sec)}
╭────────┬─────────────────────────╮
│ median │ 8sec 258ms 594µs 312ns  │
│ min    │ 3sec 202ms 705µs 958ns  │
│ max    │ 20sec 641ms 701µs 583ns │
│ stddev │ 5sec 21ms 84µs 752ns    │
maxim-uvarov commented 5 months ago

I inspected the geo_count column visually. I converted the file to CSV once again in case it had artifacts.

open data.csv | save data1.csv

Still no luck.

maxim-uvarov commented 5 months ago

for a sake of synchronization: these are screenshots from nuhslell discord with @devyn

image image image image
vincev commented 5 months ago

@vincev Question for you: are you on Apple Silicon or Intel?

@devyn I am running on a 16GB Apple M1.

devyn commented 5 months ago

Thanks, I guess that potentially eliminates that. Hmm...

maxim-uvarov commented 5 months ago

@vincev (or any other rust-knowing participants) great Ritchie from Polars advised to try some things to improve performance here

Namely:

I see they run on the default allocator. Setting jemalloc on macos might resolve it. Further if performance is important they should consider activating simd, performant and maybe compiling binaries on a nightly compiler with avx features activated.

If you ever have time for this issue again - would you please help me to incorporate proposed settings into dply to test?

p.s. I'm sorry for occupying your attention. Our polars master is on vacation, so I can't reach him with this request.

maxim-uvarov commented 5 months ago

Link for reference: https://github.com/pola-rs/polars/blob/019dfe87b28afa51a74fc87ebbccb354fd81b53a/.github/workflows/release-python.yml

maxim-uvarov commented 5 months ago

Well, I think i can do that myself. At least im trying now

vincev commented 5 months ago

@maxim-uvarov okay let me know how it goes, in the meantime I can add jemalloc and activate some of the other features but you'll have to test it as I cannot reproduce it on my box.

vincev commented 5 months ago

Just pushed a branch with jemalloc in case it helps https://github.com/vincev/dply-rs/tree/perf

maxim-uvarov commented 5 months ago

@vincev thank you so much! Helped a lot! And yes, it solved the problem!!!

> 1..100 | each {timeit {dply -c 'csv("./data.csv") | summarize(count = n(), sum = sum(geo_count)) | show()' | complete | null}} | {median: ($in | math median), min: ($in | math min), max: ($in | math max), stddev: ($in | into int | into float | math stddev | into int | $'($in).ns' | into duration --unit sec)}
╭────────┬──────────────────╮
│ median │ 75ms 471µs 521ns │
│ min    │ 71ms 384µs 250ns │
│ max    │ 85ms 949µs 625ns │
│ stddev │ 2ms 635µs 56ns   │
╰────────┴──────────────────╯
maxim-uvarov commented 5 months ago

@vincev @devyn @fdncred, especially @vincev Thank you so much for your time, help and participation! I'm super grateful!!!

vincev commented 5 months ago

@maxim-uvarov you are welcome I'll do a release with jemalloc then.