heavyai / heavydb

HeavyDB (formerly OmniSciDB)
https://heavy.ai
Apache License 2.0
2.95k stars 448 forks source link

\ste command woe #723

Open gunlinan opened 2 years ago

gunlinan commented 2 years ago

We tried omnisql \ste command in order to query historical data for comparison purpose. Then we found our recent data seemed gone. Any idea how we can recover the recent data?

Knowing omniscidb retains multiple versions of data, we're not asking for MVCC or any isolation level but just sometimes need to travel back and forth to different versions of data retained in predefined intervals.

cdessanti commented 2 years ago

In the latest releases, we shortened the default of retained epochs to 2; so if you want to want to retain more epochs when you create a table in the database

you can know how many epochs are available for a table using the command

show table details [tyable_name] to know which is the max_epoch for the table and which is the min_epoch you can revert data on.

instead of using \ste in omnisql you can use the SQL command alter table [table name] set epoch={a value less or equal to the min_epoch}.

to increase the number of epochs available for a particular table you can define at table creation:

create table test_epocs(id integer) with (max_rollback_epochs=10);

be careful, because setting a value too high could produce more metadata for the table, which could slow down access to it.

gunlinan commented 2 years ago

@cdessanti I'm sorry that I didn't communicate the issue well. I don't have problem on how to set the correct epoch with data of which I want to query. The number of epochs is neither the issue.

The issue is, for example, my table has 10 epochs (0 to 9) currently. After I set table epoch to 5 to inspect old data, I couldn't set the epoch back to 9; the table data of epoch from 6 to 9 appeared lost and irrecoverable. Is this behavior (data loss) expected on changing table epoch?

cdessanti commented 2 years ago

Hi,

I'm sorry if I misinterpreted your question. I don't think you can travel back and forth, the epoch is intended for rollback operations (in fact the parameter associated with this is called max_rollback_epocs). Also when you set the table to an epoch and you do some changes on that table the "recent epoch" numbers are overwritten

omnisql> create table test_epocs(id integer) with (max_rollback_epochs=10);
omnisql> insert into test_epocs values(1);
omnisql> insert into test_epocs values(2);
omnisql> insert into test_epocs values(3);
omnisql> \gte test_epocs
table epoch is 3
omnisql> alter table test_epocs set epoch=2;
omnisql> \gte test_epocs
table epoch is 2
omnisql> select * from test_epoc;
SQL Error: From line 2, column 6 to line 2, column 16: Object 'test_epoc' not found
omnisql> select * from test_epocs;
id
1
2
2 rows returned.
Execution time: 36 ms, Total time: 37 ms
omnisql> insert into test_epocs values(4);
omnisql> \gte test_epocs
table epoch is 3

so I think this is the intended behavior, while it's not documented feature, that should be used carefully when you have to recover an improper operation, or for whatever reason, there is a corruption into the table.

Candido