Open kozikowskik opened 2 years ago
I can confirm that running this query on PostgreSQL 13 uses about 1GB of memory, no matter if using a pager or not, or running in batch mode:
./usql 'postgres://postgres:pw@localhost:49153?sslmode=disable' -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;"
The allocation itself should not be that big of an issue, but I'd expect the memory usage to drop once the query is complete and the results have been processed.
Huh, I somehow thought the TableEncoder.count
has a non-zero default, but it's actually not used at all. Maybe we'd want to set it to something like 100k?
Also, looks like I have no idea how the Go's GC is supposed to work. Calling runtime.GC()
explicitly after executing the query drops the memory usage from under 1GB to 3MB.
Using GNU time (not the built in time
alias):
$ alias time="$(which time) -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem'"
$ time psql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" postgres://postgres:P4ssw0rd@localhost/ > /dev/null
0:02.62 real, 2.07 user, 0.10 sys, 0 amem, 231272 mmem
$ time usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" postgres://postgres:P4ssw0rd@localhost/ > /dev/null
0:03.37 real, 6.34 user, 0.25 sys, 0 amem, 1068812 mmem
It would appear that usql uses roughly 5x the memory, and takes 3x as long to run. I'll try to see if there are ways to reduce this memory usage, but I find it unlikely.
See this proof-of-concept, where I enable batching (100k): https://github.com/xo/usql/compare/master...nineinchnick:usql:print-batch
% alias time="gtime -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem'"
% time ./usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null
0:04.83 real, 4.76 user, 0.23 sys, 0 amem, 121280 mmem
% time psql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null
0:04.48 real, 2.75 user, 0.12 sys, 0 amem, 233536 mmem
% time usql -c "select 'aaaaaaaaaaaaaa' as a, 'bbbbbbbbbbbbbb' as b, i from generate_series(1,1854211) i;" 'postgres://postgres:pw@localhost/?sslmode=disable' > /dev/null
0:04.74 real, 5.07 user, 0.30 sys, 0 amem, 1219200 mmem
% usql --version
usql 0.12.13
% psql --version
psql (PostgreSQL) 14.5 (Homebrew)
The drawback is that the width of the table (and columns) can change every 100k rows. I wouldn't enable this by default but maybe allow setting an option? But we could add calling runtime.GC()
after done executing the query to reduce the memory usage if a user just forgot to add a LIMIT
to the query and actually doesn't need this many results :-)
Also I could rebase https://github.com/xo/tblfmt/pull/18
I run simple query
select * from <tablen_name> where date_column >= '<date>';
. The result for this query is 1854211 rows, after I got the first page of results theusql
process consumed 6GB of the memory.