vv9k / mobi-rs

A crate to work with MOBI format ebooks
MIT License
38 stars 10 forks source link

Use HeaderLength when parsing mobi header. #6

Closed philippeitis closed 4 years ago

philippeitis commented 4 years ago

Its possible for the mobi header to be a length other than 232, and have padding bytes of 0xFFFF_FFFF at the end. The previous assumption that this wasn't possible caused a bug where parsing the exth data would read these padding bytes 0xFFFF_FFFF - and then the program would try to read 4GB of data from the mobi.

This PR changes ExtHeader::parse to take the header_len and ExtHeader::get_records to take an offset to find the EXTH header. A few other stylistic changes are included (which should hopefully make things easier to work with).

I have also simplified lz77.rs to avoid using Cursor - which allows us to return a Vec<u8> instead of an io::Error<Vec<u8>> and prevents unexpected errors.

This resolves #5, and at least for me, makes all of my mobi files open correctly.

As an additional note, since this crate is provided as a library, it would be nice if parsing the headers returned errors for unexpected / problematic values, since this would help make identifying malformed headers a lot easier (and would make this library a lot safer to use in other applications). For instance, testing that the record lengths are correct (no longer than the file length and at least 8 elements long), or that the EXTH identifier is always EXTH would have been helpful when debugging this specific problem, and I imagine it would make future issues a lot easier to resolve.

vv9k commented 4 years ago

So with this PR I was getting index out of bounds when I tried this sample code with the book you sent me or with any book i tried to open.

  use mobi::Mobi;
  use std::env;
  fn main() -> Result<(), std::io::Error> {
      let m = Mobi::from_path(env::args().skip(1).take(1).next().expect("Missing argument"))?;
      Ok(())
  }
❯ RUST_BACKTRACE=1 ./target/release/mobi ~/dev/rust/mobi-rs/example.mobi
thread 'main' panicked at 'index out of bounds: the len is 30 but the index is 33', src/lz77.rs:30:58
stack backtrace:
   0: rust_begin_unwind
             at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panicking.rs:475
   1: core::panicking::panic_fmt
             at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/core/src/panicking.rs:85
   2: core::panicking::panic_bounds_check
             at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/core/src/panicking.rs:62
   3: mobi::lz77::decompress_lz77
   4: mobi::record::Record::parse_records
   5: mobi::Mobi::from_reader
   6: mobi::Mobi::from_path
   7: mobix::main

What fixed it is checking if the offset is within the bounds of text not the data length. I will display it below.

vv9k commented 4 years ago

Thanks for all the work :) I'll try to release a fixed version soon. I want to include some more changes.