Innodb Page Compression

artfiedler commented 3 years ago

It seems one of the ibd files I'm looking to recover (I altered the table thinking I was doing a "create like", so some columns got dropped) has a mix of compressed pages and uncompressed. I'm able to extract X number or rows that I can visibility see in the ibd file that I believe was before I set the page compression on a year ago. However everything since then is not being pulled out, I believe its because of this page compression, I see in the 36mb file a bunch of compressed looking text (believe zlib) however the extracted data results in 5.5mb or so

Does this tool support decompressing pages? Will it?

artfiedler commented 3 years ago

Here is some information on the page compression, https://mariadb.com/kb/en/innodb-page-compression/

akuzminsky commented 3 years ago

stream_parser cannot handle compressed pages. It doesn't understand that format and a size of a page is different (less than 16k). You may want to look at https://bazaar.launchpad.net/~akuzminsky/percona-data-recovery-tool-for-innodb/decompress/view/head:/page_parser.c . It's an experimental branch. AFAIR, if page_parser sees a compressed page it will uncompress it and save as a separate file.

artfiedler commented 3 years ago

Well, I modified(hacked it with an axe) your stream_parser and it seems to now support mariadb's innodb page compression. Previously out of a 36MB file 372 pages were uncompressed and resulted in about 5.5MB of data extracted... now with this page compression support added I'm able to get another 1345 pages extracted resulting in 27MB data extracted... there appears to also be some other "mysql" compression pages as well which were skipped until I find some information on that.

However, now I ran into the issue ~~that c_parser errors on sql_parser.y line 149~~(this was due to wrong field name in the table create script for the primary key) now getting Segmentation fault which I think is telling me its hitting some pages that only have 3 fields verse the 9 fields of the original table (alter table dropped some columns) so hopefully I'll be able to wack this with a branch and see if I can get c_parser to output 2 different schemas or skip pages that dont match the current schema and just run it twice.

artfiedler commented 3 years ago

Score! Was able to extract the data I needed only lost 0.001% (hand full of rows), but those wont be worth my time recovering... they are probably there just in that mysql compression format instead of the mariadb page compression format.

Few problems I ran into generating the data with c_parser

datetime in mariadb has an optional microsecond, it seems based on the create script if you exclude it, it was consuming more bytes from the pages so everything was offset by about 3-4 bytes, set the create script to datetime(0) and it worked correctly.
column character set for DB trax id or the other internal field was producing a segment fault, but it was only for the debug printing it seems so I just commented out that line, this was before correcting the datetime(0), so maybeee.... this issue would have went away by itself, not sure.
sql tab separated format would not load, so I made an option -s to generate insert values() lines instead, worked fine

I'll removed my debugging junk I threw in and I'll attach the updated files here... you may want to organize the code a little differently, I was all about getting it done as fast as possible

artfiedler commented 3 years ago

Attached at the modified files, needs zlib, it should be easy to add other compression support just need the references and add the call to the libraries decompress function. I rarely write c/c++ so may need to fix a data type here or there, not sure it matters it works!

After I removed my forced debug for the c_parser it doesn't seg fault on the debug print... anyway see attached, merge if you would like. modified.zip

akuzminsky commented 3 years ago

Thank you for your contribution!

bmakan commented 3 years ago

@artfiedler I'm trying to compile your modified code, but it's failing with fatal error: zlib/zlib.h: No such file or directory even after I installed the zlib-devel library (centos). Do I need to do something else to compile this beside running make?

Edit: Sorry for bothering you. I managed to do it. Had to replace quote include with the sharp bracket include.

Turns out it didn't help my case. The parsed data is still missing innodb pages and even the parsed rows have weird values for some columns (usually the first few are fine). The logs always ignores a lot of pages:

Stream contained 0 blob, 5 innodb, 0 mysql compressed, 0 mariadb compressed and 1567 ignored page read attempts

I suppose my data is corrupted beyond recovery.

artfiedler commented 3 years ago

I believe I copied zlib.h into the libs folder

Sent from my Windows 10 device

From: Branislav Makan Sent: Wednesday, February 24, 2021 9:36 AM To: twindb/undrop-for-innodb Cc: Arthur Fiedler; Mention Subject: Re: [twindb/undrop-for-innodb] Innodb Page Compression (#21)

@artfiedler I'm trying to compile your modified code, but it's failing with fatal error: zlib/zlib.h: No such file or directory even after I installed the zlib-devel library. Do I need to do something else to compile this beside running make? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

xinxinfly commented 2 years ago

Attached at the modified files, needs zlib, it should be easy to add other compression support just need the references and add the call to the libraries decompress function. I rarely write c/c++ so may need to fix a data type here or there, not sure it matters it works!

After I removed my forced debug for the c_parser it doesn't seg fault on the debug print... anyway see attached, merge if you would like. modified.zip

hey, buddy， I tried，compressed table cannot work

twindb / undrop-for-innodb

Innodb Page Compression #21