hohav / peppi

Rust parser for Slippi SSBM replay files
MIT License
37 stars 9 forks source link

Improve shift_jis handling and convert unicode chars to their ascii equivalents #23

Closed NickCondron closed 1 year ago

NickCondron commented 1 year ago

See https://github.com/project-slippi/slippi-js/blob/master/src/utils/fullwidth.ts

Useful shift jis table: https://uic.io/en/charset/show/cp932/

I added the tilde case, which doesn't exist in slippi-js. I don't know if it's worth keeping.

This conversion is destructive, so we'd have to be more clever if we want to make it reversible. However, for now that is not necessary since we store the whole Game Start event as raw bytes for round-tripping.

Another option is to expose a melee_shift_jis type with a From<melee_shift_jis> for String for something like that.

hohav commented 1 year ago

This is good functionality, and I agree with adding it to Peppi. And while it's true that this won't currently impact round-tripping, I'm still uneasy about automatic, lossy conversions. Does slippi-js provide any way to access the original values?

NickCondron commented 1 year ago

Does slippi-js provide any way to access the original values?

No

I'm still uneasy about automatic, lossy conversions.

Gotcha. I see a potential approach:

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct MeleeShiftJis(pub(crate) String);

impl MeleeShiftJis {
    pub fn as_str(&self) -> &str {
        self.0.as_str()
    }
    pub fn to_fixed(&self) -> String {
        self.0.clone().chars().map(fix_char).collect::<String>()
    }
}

This requires the user (who owns a MeleeShiftJis struct) to make a decision on which way they want to access the inner value. Unfortunately it adds a bit of overhead, but I think that's unavoidable if we want to allow access to either form. I could probably be more clever and rewrite to_fixed to return a Cow<'_, str> to save time in the simple case that no fix is needed.

Question: Do we still need the serde derive feature and #[derive(Serialize)] everywhere? To my knowledge it's never really used. It seems vestigial like you weren't able to finish it due to unforeseen issues.

EDIT: I think this is the correct approach. I don't think it's worth bothering with a Cow when a simple solution works well. I've rebased/squashed the changes.

NickCondron commented 1 year ago

I adopted both of your naming suggestions, thanks.

NickCondron commented 1 year ago

Thank you for the suggestions! I think the code is a lot tighter now.

hohav commented 1 year ago

Really nice work, and sorry I let this sit for so long. Thank you!