phiresky / sqlite-zstd

Transparent dictionary-based row-level compression for SQLite
GNU Lesser General Public License v3.0
1.43k stars 49 forks source link

Compressed DB twice the size of original #28

Closed mandolyte closed 1 year ago

mandolyte commented 1 year ago

This is the first time I have used this extension... perhaps I am using it wrong? Below find two things. The first is a script that I run. The second is the output of the script. The output shows the size before compression at the beginning and the size after compression.

Here is the script:

#!/bin/sh

cp ../convert-zip-to-db/translation_words.db ./tw.db
ls -al tw.db
sqlite3 tw.db <<EoF
CREATE TABLE tw (
  name text primary key,
  keyterm text,
  keyword text,
  data text
);
INSERT INTO tw 
SELECT name, keyterm, keyword, data 
FROM translation_words;
DROP TABLE translation_words;
.schema
select zstd_enable_transparent('{"table": "tw", "column": "data", "compression_level": 19, "dict_chooser": "''a''"}');
select zstd_incremental_maintenance(null, 1);
.tables
select count(*) from tw;
select keyword,keyterm from tw where keyword = 'winepress';
.quit
EoF
ls -al tw.db

Here is the output:

-rw-r--r-- 1 cecil cecil 1921024 Apr  1 21:28 tw.db
[2023-04-02T01:28:05Z INFO  sqlite_zstd::create_extension] [sqlite-zstd] initialized
CREATE TABLE tw (
  name text primary key,
  keyterm text,
  keyword text,
  data text
);
[2023-04-02T01:28:05Z WARN  sqlite_zstd::transparent] Warning: It is recommended to set `pragma journal_mode=WAL;`
[2023-04-02T01:28:05Z WARN  sqlite_zstd::transparent] Warning: It is recommended to set `pragma auto_vacuum=full;`
[2023-04-02T01:28:05Z WARN  sqlite_zstd::transparent] Warning: It is recommended to set `pragma busy_timeout=2000;` or higher
Done!
[2023-04-02T01:28:05Z WARN  sqlite_zstd::transparent] Warning: It is recommended to set `pragma journal_mode=WAL;`
[2023-04-02T01:28:05Z WARN  sqlite_zstd::transparent] Warning: It is recommended to set `pragma auto_vacuum=full;`
[2023-04-02T01:28:05Z WARN  sqlite_zstd::transparent] Warning: It is recommended to set `pragma busy_timeout=2000;` or higher
[2023-04-02T01:28:05Z INFO  sqlite_zstd::transparent] tw.data: Total 935 rows (1.46MB) to potentially compress (split in 1 groups).
[2023-04-02T01:28:05Z INFO  sqlite_zstd::transparent] Compressed 935 rows with dict_choice=a (dict_id=1). Total size of entries before: 1.46MB, afterwards: 434.39kB, (average: before=1.56kB, after=464B)
[2023-04-02T01:28:05Z INFO  sqlite_zstd::transparent] Handled 935 / 935 rows  (1.46MB / 1.46MB)
[2023-04-02T01:28:05Z INFO  sqlite_zstd::transparent] All maintenance work completed!
0
_tw_zstd       _zstd_configs  _zstd_dicts    tw           
935
winepress|other
-rw-r--r-- 1 cecil cecil 3878912 Apr  1 21:28 tw.db
phiresky commented 1 year ago

You also need to call VACUUM in order for the size of the sqlite3 file to actually decrease. I've also added this as a note to the readme.