xo / usql

Universal command-line interface for SQL databases
MIT License
8.94k stars 352 forks source link

High memory usage #371

Open kozikowskik opened 2 years ago

kozikowskik commented 2 years ago

I run simple query select * from <tablen_name> where date_column >= '<date>';. The result for this query is 1854211 rows, after I got the first page of results the usql process consumed 6GB of the memory.

obraz

nineinchnick commented 2 years ago

I can confirm that running this query on PostgreSQL 13 uses about 1GB of memory, no matter if using a pager or not, or running in batch mode:

./usql 'postgres://postgres:pw@localhost:49153?sslmode=disable' -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;"
nineinchnick commented 2 years ago

pprof001

nineinchnick commented 2 years ago

The allocation itself should not be that big of an issue, but I'd expect the memory usage to drop once the query is complete and the results have been processed.

nineinchnick commented 2 years ago

Huh, I somehow thought the TableEncoder.count has a non-zero default, but it's actually not used at all. Maybe we'd want to set it to something like 100k?

nineinchnick commented 2 years ago

Also, looks like I have no idea how the Go's GC is supposed to work. Calling runtime.GC() explicitly after executing the query drops the memory usage from under 1GB to 3MB.

kenshaw commented 2 years ago

Using GNU time (not the built in time alias):

$ alias time="$(which time) -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem'"
$ time psql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" postgres://postgres:P4ssw0rd@localhost/ > /dev/null
    0:02.62 real,   2.07 user,  0.10 sys,   0 amem, 231272 mmem
$ time usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" postgres://postgres:P4ssw0rd@localhost/ > /dev/null
    0:03.37 real,   6.34 user,  0.25 sys,   0 amem, 1068812 mmem

It would appear that usql uses roughly 5x the memory, and takes 3x as long to run. I'll try to see if there are ways to reduce this memory usage, but I find it unlikely.

nineinchnick commented 2 years ago

See this proof-of-concept, where I enable batching (100k): https://github.com/xo/usql/compare/master...nineinchnick:usql:print-batch

% alias time="gtime -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem'"
% time ./usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null
    0:04.83 real,   4.76 user,  0.23 sys,   0 amem, 121280 mmem
% time psql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null  
    0:04.48 real,   2.75 user,  0.12 sys,   0 amem, 233536 mmem
% time usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null  
    0:04.74 real,   5.07 user,  0.30 sys,   0 amem, 1219200 mmem
% usql --version
usql 0.12.13
% psql --version
psql (PostgreSQL) 14.5 (Homebrew)

The drawback is that the width of the table (and columns) can change every 100k rows. I wouldn't enable this by default but maybe allow setting an option? But we could add calling runtime.GC() after done executing the query to reduce the memory usage if a user just forgot to add a LIMIT to the query and actually doesn't need this many results :-)

Also I could rebase https://github.com/xo/tblfmt/pull/18