Open j68k opened 2 years ago
If you ever create a cross platform library to do this, if you'd publish it on pypi, i'd start using it right away to start renaming chds in my rhdndat project (the rhdndat-rn part in particular).
I'd love if there was a way to write new chds with a project like this too, because it would allow to create delta child chds from romhacking style xdelta patches without a whole complicated mess of temporary files and cutting bins into tracks just to create yet another image to delta encode with the original chd.
MAME is sometimes so disappointing, they create the only cd image format that has delta encoding, then they do absolutely nothing to make it easy to use with the current state of the world (which for romhacks in particular is xdelta patches on the binary track). They're even considering 'reworking' chd to a 'more accurate' format - to be clear, this probably would require redumps of absolutely everything, and i can think of no finer way to kill the format.
Thought I would drop by from https://github.com/rtissera/libchdr/issues/79 since I think it's unlikely libchdr will ever provide Python bindings.
chd-rs-py provides Python bindings to chd-rs which is all in pure Rust (without the undocumented fast_zlib
feature which switches to using zlib-ng for Deflate decompression) and so will work on Windows, Linux and macOS.
I used maturin to publish it hastily to pypi so I'm not sure if the current version on pypi is compiled for Linux. Python bindings isn't something I use so I probably won't go further than this personally but I'll be happy to accept PRs for anything you need as long as chd-rs exposes it, which should pretty much be everything.
Thank you very much for the work and for letting me know. I'll definitely check that out next time I'm working on this program. Very much appreciated!
Doesn't have a linux wheel yet, i think it needs it specified in cargo.toml for the pypa then build the wheels somehow.
As in this article: https://blog.yossarian.net/2020/08/02/Writing-and-publishing-a-python-module-in-rust
[package.metadata.maturin]
classifier = [
"Programming Language :: Rust",
"Operating System :: POSIX :: Linux",
]
same for the rest, windows, macos, bsd etc.
I'm also curious if the metadata is enough to split/know the extents of the tracks in memory.
0.1.2 should have a wheel for linux + sdist if you have a Rust compiler installed
Thanks. Can this support reading delta chds?
I suppose the chd_open call either would have to take a list (of ancestors) or internally it would have to search the filesystem itself.
It's rare the emulator that supports the delta chd format, but i'd like that to change, preferably by scanning/enabling a option to scan the current directory.
chd-rs supports opening a parent but because of lifetime issues (unlike Python you can't have multiple references to the inner stream) you can't really pass in a Chd
object directly in from Python easily.
It's an API problem that should be easy to solve but I don't have time to do much more than this. Feel free to explore it and send a PR. You basically need to hack in an impl Clone
for Chd
.
I see, python doesn't have the equivalent from 'passing ownership' so to prevent errors, the constructor should call clone to prevent python objects being reused after being passed to the rust constructor.
Oh hey, i checked the code and this appears to already exist?
#[pyfunction]
fn chd_open(path: String, parent: Option<String>) -> PyResult<Chd> {
let file = File::open(path)?;
let parent = parent
.map(|p| File::open(p).map(BufReader::new))
.map_or(Ok(None), |v| v.map(Some))?
.map(|n| ChdFile::open(n, None))
.map_or(Ok(None), |v| v.map(|v| Some(Box::new(v))))
.map_err(ChdPyError)?;
let chd = ChdFile::open(BufReader::new(file), parent).map_err(ChdPyError)?;
Ok(Chd {
inner: chd,
cmp_vec: Vec::new(),
})
}
Were you mentioning the 'parent of a parent' situation, not just one level of indirection of reads?
I mean, yes, it's a bit awkward that you have to instantiate ChdFiles just to figure out if the file is a delta chd and to find the parent delta (from the checksum of the parent, iirc) before finally making the final result and discarding the rest, but it should work no?
Yeah, I already considered single-parent which is pretty easy to solve.
Theoretically any level of indirection is easy as long as the lifetimes of the parent CHDs are fully resolved within chd_open
but you'll just need to come up with an API that does that.
It would be more ergonomic to not have to do that ad-hoc. Something like nvm, forgot that opening a child takes ownership of Chd::open_child(&mut self, String)
would probably work and you just build the tree bottom up.self
I mean, yes, it's a bit awkward that you have to instantiate ChdFiles just to figure out if the file is a delta chd and to find the parent delta (from the checksum of the parent, iirc) before finally making the final result and discarding the rest, but it should work no?
Yes, that would work but you can't open the ChdFile
directly because it needs the parent to already be valid so the ChdFile
is always valid. Instead use ChdHeader::try_read_header
to get the header and check the parent SHA if you want to do it that way.
As far as I know though, support for multiple parents is iffy even with chdman
so I don't think you're going to run into that situation a lot
I suspect it will always be awkward unless a alternate constructor gives a option to scan the filesystem/current dir (at least, maybe even check parent dir, for a obvious tree directory structure for hacks, or sibling directories, or all of these, without recursion naturally), because in most usecases (not mame i suppose, which has its software lists), you don't know which file is the parent.
Although, people tend to say it's bad form for libraries to do file io, for obvious reasons.
I'm interested in delta encoding as a 'softpatch for isos'. I'm interested in (eventually) creating a utility that takes a chd and a xdelta, and makes a child delta chd with the binary track patched, so all the iso hacks in romhacking.net finally have a softpatch format.
Also patching chd or updating patches of chds is a nightmare, you have to uncompress, divide into tracks again, patch and then recompress. Not good usability. Like this it would be a single command and the original game would still be available and to 'update' just means deleting the children and patching the parent again.
The header definition in chd doesn't have access to the sha1 of the parent, or doesn't show it. It only shows 'has_parent' not the parent checksum.
This makes it kind of impossible to find the parents by scanning
edit: oh, it's public. I shouldn't complain without building at least once...
What about exposing the try_read_header?
I wonder if something like this is enough:
#[pyclass]
struct Header {
inner: ChdHeader,
}
#[pyfunction]
fn try_read_headers(paths: Vec<Option<String>>) -> Vec<PyResult<Header>> {
let r = Vec::new()
for path in paths {
let h = path
.map(|p| File::open(p).map(BufReader::new))
.map_or(Ok(None), |v| v.map(Some))
.map(|n| ChdHeader::try_read_header(n))
.map_or(Ok(None), |v| v.map(|v| Some(Box::new(v))))
.map_err(ChdPyError);
r.push(
Ok(Header {
inner: h
}))
}
r
}
/// A Python module implemented in Rust. The name of this function must match
/// the `lib.name` setting in the `Cargo.toml`, else Python will not be able to
/// import the module.
#[pymodule]
fn chd(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(chd_open, m)?)?;
m.add_function(wrap_pyfunction!(try_read_headers, m)?)?;
Ok(())
}
ie: exposing two functions for python code to 'try' to read the headers of all chd files it asks, then if a file it wants to load has a parent, do:
r = try_read_headers([file])[0]
if r not None and r.has_parent():
l = try_read_headers([....every other file, found with some application specific strategy...])
for h in l:
if h.sha1 == r.parent_sha1:
#create file
#etc
I don't know if i should be testing for 'None' or the exception too. I'm not too sure how this is handled, any exception should interrupt or just be ignored. Not sure if it's needed to painstackingly expose all needed function/properties to python or #[pyclass]
takes care of that.
Any error that bubbles up will automatically be turned into an Exception after crossing the Python FFI barrier. You don't need to test for None
unless Option<T>
is returned in Rust, otherwise it's guaranteed to either throw an Exception (in the case of Err
), or return something. You will need #[pyclass]
to expose the functions you need from Header
, but there's not that many.
and the public properties?
Yeah you'll need to expose the public properties as methods in #[pyclass]
since to CPython it's just a builtin.
HeaderV*
implement Clone
, so you can possibly use #[pyo3(get, set)]
instead if you only care about CHD v5.
let header = ...;
let header: HeaderV5 = match header {
V5Header(h) => h,
_ => unimplemented!(); // will crash python, but you can return `Err` instead which will throw an exception
}
So something like this?
#[pyclass]
struct Header {
inner: ChdHeader,
}
#[pymethods]
#[pyo3(get)]
impl Header {
pub fn is_compressed(&self) -> bool
{
self.inner.is_compressed()
}
pub fn meta_offset(&self) -> Option<u64>
{
self.inner.meta_offset()
}
pub fn flags(&self) -> Option<u32>
{
self.inner.flags()
}
pub fn hunk_count(&self) -> u32
{
self.inner.hunk_count()
}
pub fn hunk_size(&self) -> u32
{
self.inner.hunk_size()
}
pub fn logical_bytes(&self) -> u64
{
self.inner.logical_bytes()
}
pub fn unit_bytes(&self) -> u32
{
self.inner.unit_bytes()
}
pub fn unit_count(&self) -> u64
{
self.inner.unit_count()
}
pub fn has_parent(&self) -> bool
{
self.inner.has_parent()
}
pub fn len(&self) -> u32 {
self.inner.len()
}
}
#[pyfunction]
fn try_read_headers(paths: Vec<String>) -> PyResult<Vec<Header>> {
let r = Vec::new()
for path in paths {
let f = File::open(path)
.map_err(ChdPyError)?;
let h = ChdHeader::try_read_header(BufReader::new(f))
.map_err(ChdPyError)?;
let header: HeaderV5 = match h {
V5Header(h) => h,
_ => ChdPyError?;
}
r.push(
Header {
inner: header
})
}
r
}
/// A Python module implemented in Rust. The name of this function must match
/// the `lib.name` setting in the `Cargo.toml`, else Python will not be able to
/// import the module.
#[pymodule]
fn chd(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(chd_open, m)?)?;
m.add_function(wrap_pyfunction!(try_read_headers, m)?)?;
Ok(())
}
Mmm python doesn't have a (obligatory) way to make sure the argument is not None. I have to use option in the arguments of try_read_headers don't I?
There is a typing module in recent python with 'optional', but since python doesn't assume that a non-optional value can't be None, i do have to use Option right?
edit: mmm, #[pymethods]
and #[pyo3(get)]
at the same time look like they have possibility to duplicate methods.
Instead of storing CHDHeader as inner just store HeaderV5 which should have everything you need. I think PyO3 should handle checking if None gets passed into the method.
I would set up a Rust toolchain and fiddle with it until it compiles. Just follow the guide at https://pyo3.rs/v0.16.4/
Docs for chd-rs are at https://docs.rs/chd/latest/chd/index.html
Finally. Got something that builds.
#[pyclass]
struct Header {
inner: ChdHeader,
}
#[pymethods]
impl Header {
pub fn is_compressed(&self) -> bool
{
self.inner.is_compressed()
}
pub fn meta_offset(&self) -> Option<u64>
{
self.inner.meta_offset()
}
pub fn flags(&self) -> Option<u32>
{
self.inner.flags()
}
pub fn hunk_count(&self) -> u32
{
self.inner.hunk_count()
}
pub fn hunk_size(&self) -> u32
{
self.inner.hunk_size()
}
pub fn logical_bytes(&self) -> u64
{
self.inner.logical_bytes()
}
pub fn unit_bytes(&self) -> u32
{
self.inner.unit_bytes()
}
pub fn unit_count(&self) -> u64
{
self.inner.unit_count()
}
pub fn has_parent(&self) -> bool
{
self.inner.has_parent()
}
pub fn len(&self) -> u32 {
self.inner.len()
}
pub fn get_sha1(&self) -> &[u8] {
let h = match &self.inner {
V5Header(h) => h,
_ => unimplemented!(),
};
&h.sha1
}
pub fn get_parent_sha1(&self) -> &[u8] {
let h = match &self.inner {
V5Header(h) => h,
_ => unimplemented!(),
};
&h.parent_sha1
}
pub fn get_raw_sha1(&self) -> &[u8] {
let h = match &self.inner {
V5Header(h) => h,
_ => unimplemented!(),
};
&h.raw_sha1
}
}
#[pyfunction]
fn try_read_headers(paths: Vec<String>) -> PyResult<Vec<Header>> {
let mut v = Vec::new();
for path in paths {
let f = File::open(path)?;
let mut r = BufReader::new(f);
let h = ChdHeader::try_read_header(&mut r)
.map_err(ChdPyError)?;
match h {
V5Header(_) => Ok(()),
_ => Err(ChdPyError(chd::ChdError::UnsupportedVersion)),
}?;
v.push(
Header {
inner: h
});
}
Ok(v)
}
/// A Python module implemented in Rust. The name of this function must match
/// the `lib.name` setting in the `Cargo.toml`, else Python will not be able to
/// import the module.
#[pymodule]
fn chd(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(chd_open, m)?)?;
m.add_function(wrap_pyfunction!(try_read_headers, m)?)?;
Ok(())
}
do you want a PR @chyyran?
Sure I'd be happy to take PRs
After the pr was done this is what i have to build the chd @j68k , hope you can find a way to 'draw the rest of the owl' because i'm drawing blanks into turning the fourcc tags into actual track sha1 sums. I know we have to calculate the extents of the tracks into bytes elapsed from the tags...
code is untested so don't be surprised if it has some dumb bugs.
Thank you very much for getting that done. And that's definitely helpful to have an outline of how to use the .chd library.
Just to set expectations appropriately, I'm afraid I'm painfully busy with another project for the rest of this month at least, so it will be a while before I can do substantial work on verifydump. So, sorry about the wait, but thank you again!
https://github.com/Manorhos/imageparse/blob/master/src/chd.rs this might be of interest, but this is using chd-rs
in Rust directly and not with Python
That's complex. I guess i'm going to bug that person to add the delta-chd constructor so that a eventual python binding of that can work, because i have no confidence in myself on translating that code.
edit: then again, that project is a abstraction over both chd and cue/bin, so i'm not sure they'd add a chd specific factory method.
That code just uses chd-rs and as per SnowflakePowered/chd-rs#2 there are no plans until MAME adds support for it.
I was writing about children chd, I just call them 'delta chds'. Xdelta subchild support is a feature apart.
The code in that image project is using a slightly older version of chd-rs without the two arguments chd_open.
BTW i found a github action that builds multiplatform wheels of rust maturin projects: https://github.com/messense/maturin-action
Might be useful if we make the binding to that library.
There are at least a couple of libraries for reading .chd files now, which would hopefully make this fairly easy.