Thanks for the tool. Question

I really appreciate what you have done here. I have used stream-parser on an IBD table-only file, and it returned 3 index page-files. I used c-parser on each one. Only the first had valid data. The second and third were completely garbled (the field data was garbled). I used the -6 option, and both the -U and -D options.

Can you explain why sometimes the output is garbled?

Could you update the README or wiki documentation to explain the difference between the c_parser -U and -D options? The latter results in more corrupted output.

In my runs, I see a lot of errors like:

-- #####CannotOpen_./0000000368541696.page;
-- print_field_value_with_external(): open(): No such file or directory

Here is the table structure. Perhaps this provides insights?

CREATE TABLE `xxxx` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `book_datetime` datetime DEFAULT NULL,
  `start_datetime` datetime DEFAULT NULL,
  `end_datetime` datetime DEFAULT NULL,
  `notes` mediumtext COLLATE utf8mb4_german2_ci,
  `hash` mediumtext COLLATE utf8mb4_german2_ci,
  `is_unavailable` tinyint(4) DEFAULT '0',
  `id_users_provider` int(11) DEFAULT NULL,
  `id_users_customer` int(11) DEFAULT NULL,
  `id_services` int(11) DEFAULT NULL,
  `id_google_calendar` mediumtext COLLATE utf8mb4_german2_ci,
  `appointment_type` varchar(40) COLLATE utf8mb4_german2_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `id_users_customer` (`id_users_customer`),
  KEY `id_services` (`id_services`),
  KEY `id_users_provider` (`id_users_provider`),
) ENGINE=InnoDB AUTO_INCREMENT=8869 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_german2_ci;

(constraints removed)

Could you update the README or wiki documentation to explain the difference between the c_parser -U and -D options? The latter results in more corrupted output.

An InnoDB records has a is_deleted flag in its header. The tool can be instructed to look for records where the flag is set (-D option), where this flag is unset (-U option), or ignore the flag which is the default behavior.

If the -D option is not specified c_parser tries to validate the page. Each record has a pointer to the next one. c_parser follows the pointers and if it can travel from the infimum record to the supremum record, then the page is considered non-corrupt. If the -D option is specified assumes the page is corrupt and scans the page byte by byte. When c_parser scans the page byte by byte many false matches are possible (that's why you see many garbage records with -D). To prevent the false matches one should use field values constraints (filters).

twindb / undrop-for-innodb

Thanks for the tool. Question #25