j68k / verifydump

A tool for verifying that .chd/.rvz disc images match Redump Datfiles
MIT License
51 stars 7 forks source link

Read .chd files internally to avoid chdman dependency and need for temporary files #8

Open j68k opened 2 years ago

j68k commented 2 years ago

There are at least a couple of libraries for reading .chd files now, which would hopefully make this fairly easy.

i30817 commented 2 years ago

If you ever create a cross platform library to do this, if you'd publish it on pypi, i'd start using it right away to start renaming chds in my rhdndat project (the rhdndat-rn part in particular).

I'd love if there was a way to write new chds with a project like this too, because it would allow to create delta child chds from romhacking style xdelta patches without a whole complicated mess of temporary files and cutting bins into tracks just to create yet another image to delta encode with the original chd.

MAME is sometimes so disappointing, they create the only cd image format that has delta encoding, then they do absolutely nothing to make it easy to use with the current state of the world (which for romhacks in particular is xdelta patches on the binary track). They're even considering 'reworking' chd to a 'more accurate' format - to be clear, this probably would require redumps of absolutely everything, and i can think of no finer way to kill the format.

chyyran commented 1 year ago

Thought I would drop by from https://github.com/rtissera/libchdr/issues/79 since I think it's unlikely libchdr will ever provide Python bindings.

chd-rs-py provides Python bindings to chd-rs which is all in pure Rust (without the undocumented fast_zlib feature which switches to using zlib-ng for Deflate decompression) and so will work on Windows, Linux and macOS.

I used maturin to publish it hastily to pypi so I'm not sure if the current version on pypi is compiled for Linux. Python bindings isn't something I use so I probably won't go further than this personally but I'll be happy to accept PRs for anything you need as long as chd-rs exposes it, which should pretty much be everything.

j68k commented 1 year ago

Thank you very much for the work and for letting me know. I'll definitely check that out next time I'm working on this program. Very much appreciated!

i30817 commented 1 year ago

Doesn't have a linux wheel yet, i think it needs it specified in cargo.toml for the pypa then build the wheels somehow.

As in this article: https://blog.yossarian.net/2020/08/02/Writing-and-publishing-a-python-module-in-rust

[package.metadata.maturin]
classifier = [
  "Programming Language :: Rust",
  "Operating System :: POSIX :: Linux",
]

same for the rest, windows, macos, bsd etc.

I'm also curious if the metadata is enough to split/know the extents of the tracks in memory.

chyyran commented 1 year ago

0.1.2 should have a wheel for linux + sdist if you have a Rust compiler installed

i30817 commented 1 year ago

Thanks. Can this support reading delta chds?

I suppose the chd_open call either would have to take a list (of ancestors) or internally it would have to search the filesystem itself.

It's rare the emulator that supports the delta chd format, but i'd like that to change, preferably by scanning/enabling a option to scan the current directory.

chyyran commented 1 year ago

chd-rs supports opening a parent but because of lifetime issues (unlike Python you can't have multiple references to the inner stream) you can't really pass in a Chd object directly in from Python easily.

It's an API problem that should be easy to solve but I don't have time to do much more than this. Feel free to explore it and send a PR. You basically need to hack in an impl Clone for Chd.

i30817 commented 1 year ago

I see, python doesn't have the equivalent from 'passing ownership' so to prevent errors, the constructor should call clone to prevent python objects being reused after being passed to the rust constructor.

i30817 commented 1 year ago

Oh hey, i checked the code and this appears to already exist?

#[pyfunction]
fn chd_open(path: String, parent: Option<String>) -> PyResult<Chd> {
    let file = File::open(path)?;
    let parent = parent
        .map(|p| File::open(p).map(BufReader::new))
        .map_or(Ok(None), |v| v.map(Some))?
        .map(|n| ChdFile::open(n, None))
        .map_or(Ok(None), |v| v.map(|v| Some(Box::new(v))))
        .map_err(ChdPyError)?;

    let chd = ChdFile::open(BufReader::new(file), parent).map_err(ChdPyError)?;
    Ok(Chd {
        inner: chd,
        cmp_vec: Vec::new(),
    })
}

Were you mentioning the 'parent of a parent' situation, not just one level of indirection of reads?

I mean, yes, it's a bit awkward that you have to instantiate ChdFiles just to figure out if the file is a delta chd and to find the parent delta (from the checksum of the parent, iirc) before finally making the final result and discarding the rest, but it should work no?

chyyran commented 1 year ago

Yeah, I already considered single-parent which is pretty easy to solve.

Theoretically any level of indirection is easy as long as the lifetimes of the parent CHDs are fully resolved within chd_open but you'll just need to come up with an API that does that.

It would be more ergonomic to not have to do that ad-hoc. Something like Chd::open_child(&mut self, String) would probably work and you just build the tree bottom up. nvm, forgot that opening a child takes ownership of self

chyyran commented 1 year ago

I mean, yes, it's a bit awkward that you have to instantiate ChdFiles just to figure out if the file is a delta chd and to find the parent delta (from the checksum of the parent, iirc) before finally making the final result and discarding the rest, but it should work no?

Yes, that would work but you can't open the ChdFile directly because it needs the parent to already be valid so the ChdFile is always valid. Instead use ChdHeader::try_read_header to get the header and check the parent SHA if you want to do it that way.

chyyran commented 1 year ago

As far as I know though, support for multiple parents is iffy even with chdman so I don't think you're going to run into that situation a lot

i30817 commented 1 year ago

I suspect it will always be awkward unless a alternate constructor gives a option to scan the filesystem/current dir (at least, maybe even check parent dir, for a obvious tree directory structure for hacks, or sibling directories, or all of these, without recursion naturally), because in most usecases (not mame i suppose, which has its software lists), you don't know which file is the parent.

Although, people tend to say it's bad form for libraries to do file io, for obvious reasons.

I'm interested in delta encoding as a 'softpatch for isos'. I'm interested in (eventually) creating a utility that takes a chd and a xdelta, and makes a child delta chd with the binary track patched, so all the iso hacks in romhacking.net finally have a softpatch format.

Also patching chd or updating patches of chds is a nightmare, you have to uncompress, divide into tracks again, patch and then recompress. Not good usability. Like this it would be a single command and the original game would still be available and to 'update' just means deleting the children and patching the parent again.

i30817 commented 1 year ago

The header definition in chd doesn't have access to the sha1 of the parent, or doesn't show it. It only shows 'has_parent' not the parent checksum.

This makes it kind of impossible to find the parents by scanning

edit: oh, it's public. I shouldn't complain without building at least once...

i30817 commented 1 year ago

What about exposing the try_read_header?

I wonder if something like this is enough:

#[pyclass]
struct Header {
    inner: ChdHeader,
}

#[pyfunction]
fn try_read_headers(paths: Vec<Option<String>>) -> Vec<PyResult<Header>> {
    let r = Vec::new()
    for path in paths {
        let h = path
            .map(|p| File::open(p).map(BufReader::new))
            .map_or(Ok(None), |v| v.map(Some))
            .map(|n| ChdHeader::try_read_header(n))
            .map_or(Ok(None), |v| v.map(|v| Some(Box::new(v))))
            .map_err(ChdPyError);

        r.push(
            Ok(Header {
                inner: h
            }))
    }
    r
}

/// A Python module implemented in Rust. The name of this function must match
/// the `lib.name` setting in the `Cargo.toml`, else Python will not be able to
/// import the module.
#[pymodule]
fn chd(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(chd_open, m)?)?;
    m.add_function(wrap_pyfunction!(try_read_headers, m)?)?;
    Ok(())
}

ie: exposing two functions for python code to 'try' to read the headers of all chd files it asks, then if a file it wants to load has a parent, do:

r = try_read_headers([file])[0]
if r not None and r.has_parent():
    l = try_read_headers([....every other file, found with some application specific strategy...])
    for h in l:
         if h.sha1 == r.parent_sha1:
              #create file

#etc

I don't know if i should be testing for 'None' or the exception too. I'm not too sure how this is handled, any exception should interrupt or just be ignored. Not sure if it's needed to painstackingly expose all needed function/properties to python or #[pyclass] takes care of that.

chyyran commented 1 year ago

Any error that bubbles up will automatically be turned into an Exception after crossing the Python FFI barrier. You don't need to test for None unless Option<T> is returned in Rust, otherwise it's guaranteed to either throw an Exception (in the case of Err), or return something. You will need #[pyclass] to expose the functions you need from Header, but there's not that many.

i30817 commented 1 year ago

and the public properties?

chyyran commented 1 year ago

Yeah you'll need to expose the public properties as methods in #[pyclass] since to CPython it's just a builtin.

chyyran commented 1 year ago

HeaderV* implement Clone, so you can possibly use #[pyo3(get, set)] instead if you only care about CHD v5.

let header = ...;
let header: HeaderV5 = match header {
  V5Header(h) => h,
  _ => unimplemented!(); // will crash python, but you can return `Err` instead which will throw an exception
}
i30817 commented 1 year ago

So something like this?

#[pyclass]
struct Header {
    inner: ChdHeader,
}

#[pymethods]
#[pyo3(get)]
impl Header {
    pub fn is_compressed(&self) -> bool
    {
        self.inner.is_compressed()
    }

    pub fn meta_offset(&self) -> Option<u64>
    {
        self.inner.meta_offset()
    }

    pub fn flags(&self) -> Option<u32>
    {
        self.inner.flags()
    }

    pub fn hunk_count(&self) -> u32
    {
        self.inner.hunk_count()
    }

    pub fn hunk_size(&self) -> u32
    {
        self.inner.hunk_size()
    }

    pub fn logical_bytes(&self) -> u64
    {
        self.inner.logical_bytes()
    }

    pub fn unit_bytes(&self) -> u32
    {
        self.inner.unit_bytes()
    }

    pub fn unit_count(&self) -> u64
    {
        self.inner.unit_count()
    }

    pub fn has_parent(&self) -> bool
    {
        self.inner.has_parent()
    }

    pub fn len(&self) -> u32 {
        self.inner.len()
    }
}

#[pyfunction]
fn try_read_headers(paths: Vec<String>) -> PyResult<Vec<Header>> {
    let r = Vec::new()
    for path in paths {
        let f = File::open(path)
            .map_err(ChdPyError)?;
        let h = ChdHeader::try_read_header(BufReader::new(f))
            .map_err(ChdPyError)?;
        let header: HeaderV5 = match h {
            V5Header(h) => h,
            _ => ChdPyError?;
        }
        r.push(
            Header {
                inner: header
            })
    }
    r
}

/// A Python module implemented in Rust. The name of this function must match
/// the `lib.name` setting in the `Cargo.toml`, else Python will not be able to
/// import the module.
#[pymodule]
fn chd(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(chd_open, m)?)?;
    m.add_function(wrap_pyfunction!(try_read_headers, m)?)?;
    Ok(())
}

Mmm python doesn't have a (obligatory) way to make sure the argument is not None. I have to use option in the arguments of try_read_headers don't I?

There is a typing module in recent python with 'optional', but since python doesn't assume that a non-optional value can't be None, i do have to use Option right?

edit: mmm, #[pymethods] and #[pyo3(get)] at the same time look like they have possibility to duplicate methods.

chyyran commented 1 year ago

Instead of storing CHDHeader as inner just store HeaderV5 which should have everything you need. I think PyO3 should handle checking if None gets passed into the method.

I would set up a Rust toolchain and fiddle with it until it compiles. Just follow the guide at https://pyo3.rs/v0.16.4/

Docs for chd-rs are at https://docs.rs/chd/latest/chd/index.html

i30817 commented 1 year ago

Finally. Got something that builds.

#[pyclass]
struct Header {
    inner: ChdHeader,
}

#[pymethods]
impl Header {
    pub fn is_compressed(&self) -> bool
    {
        self.inner.is_compressed()
    }

    pub fn meta_offset(&self) -> Option<u64>
    {
        self.inner.meta_offset()
    }

    pub fn flags(&self) -> Option<u32>
    {
        self.inner.flags()
    }

    pub fn hunk_count(&self) -> u32
    {
        self.inner.hunk_count()
    }

    pub fn hunk_size(&self) -> u32
    {
        self.inner.hunk_size()
    }

    pub fn logical_bytes(&self) -> u64
    {
        self.inner.logical_bytes()
    }

    pub fn unit_bytes(&self) -> u32
    {
        self.inner.unit_bytes()
    }

    pub fn unit_count(&self) -> u64
    {
        self.inner.unit_count()
    }

    pub fn has_parent(&self) -> bool
    {
        self.inner.has_parent()
    }

    pub fn len(&self) -> u32 {
        self.inner.len()
    }

    pub fn get_sha1(&self) -> &[u8] {
        let h = match &self.inner {
            V5Header(h) => h,
            _ => unimplemented!(),
        };
        &h.sha1
    }

    pub fn get_parent_sha1(&self) -> &[u8] {
        let h = match &self.inner {
            V5Header(h) => h,
            _ => unimplemented!(),
        };
        &h.parent_sha1
    }

    pub fn get_raw_sha1(&self) -> &[u8] {
        let h = match &self.inner {
            V5Header(h) => h,
            _ => unimplemented!(),
        };
        &h.raw_sha1
    }
}

#[pyfunction]
fn try_read_headers(paths: Vec<String>) -> PyResult<Vec<Header>> {
    let mut v = Vec::new();
    for path in paths {
        let f = File::open(path)?;
        let mut r = BufReader::new(f);
        let h = ChdHeader::try_read_header(&mut r)
            .map_err(ChdPyError)?;
        match h {
            V5Header(_) => Ok(()),
            _ => Err(ChdPyError(chd::ChdError::UnsupportedVersion)),
        }?;
        v.push(
            Header {
                inner: h
            });
    }
    Ok(v)
}

/// A Python module implemented in Rust. The name of this function must match
/// the `lib.name` setting in the `Cargo.toml`, else Python will not be able to
/// import the module.
#[pymodule]
fn chd(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(chd_open, m)?)?;
    m.add_function(wrap_pyfunction!(try_read_headers, m)?)?;
    Ok(())
}

do you want a PR @chyyran?

chyyran commented 1 year ago

Sure I'd be happy to take PRs

i30817 commented 1 year ago

After the pr was done this is what i have to build the chd @j68k , hope you can find a way to 'draw the rest of the owl' because i'm drawing blanks into turning the fourcc tags into actual track sha1 sums. I know we have to calculate the extents of the tracks into bytes elapsed from the tags...

code is untested so don't be surprised if it has some dumb bugs.

Spoiler warning ``` from chd import chd_read_header, chd_open def chd_tracks_sha1(chdfile: Path, possible_parents: List[Path]): ''' Try to extract a list of track sha1 sum strings from a chd This method supports cd type chds If passed a non-cd chd, or a delta chd without the ancestors in the list of possible parents, returns None ''' parents_headers = [ chd_read_header(str(parent)) for parent in possible_parents ] headersmap = { str(path.resolve(strict=True)) : header for path, header in zip(possible_parents, parents_headers) } chdfile = str(chdfile.resolve(strict=True)) headersmap[chdfile] = chd_read_header(chdfile) chd = bottom_up_chd_build(chdfile, headersmap) if not chd: return None tags = [ m.tag().to_bytes(4, byteorder='big') for m in chd.metadata() ] def is_cdrom(tag): return tag == b'CHCD' or tag == b'CHTR' or tag == b'CHT2' or tag == b'CHGT' or tag == b'CHGD' if not any(map(is_cdrom, tags)): return None #do sha1sum of tracks here, draw the rest of the owl def bottom_up_chd_build(chdfile, headers): header = headers[chdfile] del headers[chdfile] if header.has_parent(): try: parentfile, _ = next(filter(lambda kv: kv[1].sha1() == header.parent_sha1(), headers.items())) except StopIteration: return None parentchd = bottom_up_chd_build(parentfile, headers) if parentchd: return chd_open(chdfile, parentchd) else: return None else: return chd_open(chdfile) ```
j68k commented 1 year ago

Thank you very much for getting that done. And that's definitely helpful to have an outline of how to use the .chd library.

Just to set expectations appropriately, I'm afraid I'm painfully busy with another project for the rest of this month at least, so it will be a while before I can do substantial work on verifydump. So, sorry about the wait, but thank you again!

chyyran commented 1 year ago

https://github.com/Manorhos/imageparse/blob/master/src/chd.rs this might be of interest, but this is using chd-rs in Rust directly and not with Python

i30817 commented 1 year ago

That's complex. I guess i'm going to bug that person to add the delta-chd constructor so that a eventual python binding of that can work, because i have no confidence in myself on translating that code.

edit: then again, that project is a abstraction over both chd and cue/bin, so i'm not sure they'd add a chd specific factory method.

chyyran commented 1 year ago

That code just uses chd-rs and as per SnowflakePowered/chd-rs#2 there are no plans until MAME adds support for it.

i30817 commented 1 year ago

I was writing about children chd, I just call them 'delta chds'. Xdelta subchild support is a feature apart.

The code in that image project is using a slightly older version of chd-rs without the two arguments chd_open.

i30817 commented 1 year ago

BTW i found a github action that builds multiplatform wheels of rust maturin projects: https://github.com/messense/maturin-action

Might be useful if we make the binding to that library.