tmontaigu / dbase-rs

Rust library to read & write dBase files.
MIT License
29 stars 30 forks source link

Have NamedValue.name be a Cow to allow for owned Strings #46

Open theCapypara opened 1 year ago

theCapypara commented 1 year ago

This changes the type of NamedValue's name field to be a Cow<str> instead of &str.

The reason I need this change is because in the project we are using this crate in, we have an alternative DBF implementation that reads/writes files in-place.

For de-serializing records we want to support the same mechanism already provided by this crate (since we use this crate and our own, which doesn't implement everything, at the same time),. But since field names are read and decoded just as the iterator that generates the NamedValues in our case, we need to store the owned value in NamedValue.

I tried using a generic argument for NamedValue to have this backwards compaible, but Rust complains about the lifetime not being bound if I set a default value for the type parameter (but not if I don't...?). This was my original idea, which would have been backwards compatible:

pub struct NamedValue<'a, T, S=&'a str> where S: AsRef<str> + 'a {
    /// Reference to the field name the value belongs to
    pub name: S,
    /// The value
    pub value: T,
}
tmontaigu commented 1 year ago

You can't have default types for generic structs, one possibility could have been something like this

pub struct NamedValue<T, S> {
    /// Reference to the field name the value belongs to
    pub name: S,
    /// The value
    pub value: T,
}

pub type NamedValueOwned<T> = NamedValue<T, String>;
pub type NamedValueRef<'a, T> = NamedValue<T, &'a str>;

I'm not sure I understood why you need this change 🤔

theCapypara commented 1 year ago

I basically made my own version of the ReadableRecord trait that works with any iterator instead of the partially private FieldIterator from this crate:

pub trait ReadableRecord: dbase::ReadableRecord {
    /// Read the record from the provided data.
    fn read_from_iter<'a, I>(fields: I) -> Result<Self, FieldIOError>
    where
        I: Iterator<Item = Result<NamedValue<'a, FieldValue>, FieldIOError>>,
        Self: Sized;
}

I did it that way because then I can (A) implement this for our own implementation and (B) easily implement it for anything that can implement dbase::ReadableRecord. That means I have a mechanism of reading records that works basically the same no matter if I use the dbase crate or our own implementation.

Because the iterator that yields the NamedValue for own implementation reads and decodes the strings from the DBF file as it's iterating, I need to store these owned Strings somewhere. At the moment NamedValue only works with references.

I guess I could just make my own NamedValue struct but then making sure it works with the dbase::ReadableRecord trait is messy again since I'd need to convert them.

I was also considering opening a PR to just change dbase::ReadableRecord like I did it above, but that would require bigger changes.

tmontaigu commented 1 year ago

To be honest I am a bit hesitant about this, its a small change but I liked using & str as the NamedValue returned from this crate is always going to be a &str.

I'll think more about this

Because the iterator that yields the NamedValue for own implementation reads and decodes the strings from the DBF file as it's iterating, I need to store these owned Strings somewhere.

Since the names of fields in the file are written once in the header, does that mean you read the names each time you read a record / field ?

I basically made my own version of the ReadableRecord trait that works with any iterator instead of the partially private FieldIterator from this crate:

I think my idea when making the FieldIterator is that it could allow anyone to implement ReadableRecord, so that's a problem, what is blocking ?

theCapypara commented 1 year ago

Since the names of fields in the file are written once in the header, does that mean you read the names each time you read a record / field ?

At the moment yes, and while I wrote that, the downsides of that also occurred to me. The thing is, since we are reading live from a file which is also written to be other processes, that could potentially alter the fields, there's not really a good way of caching the fields. That's also one reason of why we still use the dbase crate for reading some files, since that works much better for reading in a lot of records. Our solution is primarily used for occasionally accessing single records.

I think my idea when making the FieldIterator is that it could allow anyone to implement ReadableRecord, so that's a problem, what is blocking ?

That works fine, but since FieldIterator is specific to the dbase crate, it can't really be used for other implementations, which is fine I think, but that's why I made my own subtrait with a generic iterator.