hohav / peppi

Rust parser for Slippi SSBM replay files
MIT License
37 stars 9 forks source link

Support parsing only final frame #22

Open NickCondron opened 1 year ago

NickCondron commented 1 year ago

Currently we support skipping the frame data entirely. There are some use cases where we only need the final frame (eg. determining winner after lrastart). It would be nice to support this use case without parsing every frame and then just picking the last one.

We might also want to support parsing only the first frame to handle the sheik fix. Replays before 1.6.0 game start didn't correctly differentiate zelda/sheik, so you have to check the first frame to tell which one actually started the game.

hohav commented 1 year ago

Does the new placements info mostly cover this, as you see it? If so then I'm inclined to pass on implementing this, to avoid feature creep.

NickCondron commented 1 year ago

Does the new placements info mostly cover this, as you see it? If so then I'm inclined to pass on implementing this, to avoid feature creep.

No because the placements field is defined by how melee determines the winner at the results screen. So it doesn't always match the definition of a win used in the competitive community.

For example a timeout with equal stocks for both players will have a tie for first in the placements field even if the damage was different.

Also, a lot of players lra-start at the end of games (especially to skip long KOs off the top) and being able to inspect the final frame(s) you could check for such cases for a more accurate win/loss determination.

NickCondron commented 1 year ago

I think changing peppi to be somewhat lazy would make this issue irrelevant and simplify parts of peppi. Currently peppi immediately parses each event and passes the event struct to the relevant Handlers trait function. An alternative lazy design would have peppi encounter an event code and pass a 'thunk' (https://wiki.haskell.org/Thunk) to the relevant Handlers function for that event. The struct that implements Handlers can then choose to evaluate the thunk (and receive the relevant event struct) or not in which case the parser will simply skip over that event.

This would allow users to decide at runtime which events to parse avoiding unnecessary work. This would also remove the need for options like skip_frames, and enable other use cases like skipping only item events or skipping all events after meeting a certain condition (eg. after the first stock is taken).

I haven't fully fleshed out this idea, but I'm curious what you think?

hohav commented 1 year ago

My instinct is that this would hurt the performance of object-based parsing enough to be a problem, but I'd be interested to see some performance numbers. py-slippi does something like this, but it's because python is so slow at low-level bit fiddling that the overhead was worth it.

NickCondron commented 1 year ago

I'm working on a prototype lazy parsing system. We will see if it's faster or not haha. It's still a WIP, but basic idea is to scan the replay once to build an 'outline' reading the event codes and the frame event indexes only.

pub struct GameInfo {
    pub start: game::Start,
    pub end: game::End,
    pub metadata: metadata::Metadata,
    pub metadata_raw: serde_json::Map<String, serde_json::Value>,
}

pub struct FrameOutline<'a> {
    pub index: i32,
    pub start: Option<&'a [u8]>,
    pub end: Option<&'a [u8]>,
    pub pre_leaders: [Option<&'a [u8]>; NUM_PORTS],
    pub pre_follower: [Option<&'a [u8]>; NUM_PORTS],
    pub post_leaders: [Option<&'a [u8]>; NUM_PORTS],
    pub post_follower: [Option<&'a [u8]>; NUM_PORTS],
    pub items: [Option<&'a [u8]>; 15],
}

pub struct GameOutline<'a> {
    pub info: GameInfo,
    pub gecko_codes: &'a [u8],
    pub frames: Vec<FrameOutline<'a>>,
}

This has a few advantages:

  1. Transformations to the game structure are super fast. For example you can filter out rollback frames or ignore items without ever having to parse those events.
  2. Quickly detect structural errors later in replay file before spending time parsing the whole thing. In practice, if the replay file structure is sound the replay is generally valid.
  3. You know the size of the replay so you can efficiently allocate memory up front.
  4. For event based parsing you only have to parse what you need, but making this ergonomic is a bit of a challenge.

Disadvantages:

  1. You can't validate anything you avoid parsing so you could potentially parse an unsound replay without detecting that error depending on your use case.
  2. It's slower to parse everything in two passes, but maybe this will be offset by (3) above