hohav / py-slippi

Python library for parsing SSBM replay files
MIT License
56 stars 25 forks source link

Issues with parsing file in-progress #26

Open heatherspacek opened 3 years ago

heatherspacek commented 3 years ago

Hi,

I'm trying apply parse() to a tail -f stream from a slippi file in the process of being written (i.e. a game in progress). My relevant python snippet reads as follows (Windows 10, Python 3.9, unix tail via cygwin):

p = subprocess.Popen(["cmd", "/c", "tail -c+1 -f %s" % fullpath], stdout=subprocess.PIPE)

parse(p.stdout, handlers)

It appears that the portion of "parse()" that measures how long the Raw element is, is misestimating because the file has just begun being written? I've attached a screenshot of the error thrown - the parser is looking for the metadata element (which, i believe, hasnt been written yet) and is looking very close to the beginning of the file (see the correspondence between the "bytes found" and the highlighted bytes in a hex editor).

Screenshot 2020-12-15 013054 Screenshot 2020-12-15 013127

Please also let me know if I've set it up wrong, or if there's a convenient workaround here! Thanks in advance.

hohav commented 3 years ago

Thanks for the report. Are you on latest master? And does this happen with all replays?

Parse error (3 @0x-1fca) is very weird. There should be a meaningful name in place of the 3, and 0x-1fca is a negative file position (it's formatted weirdly since the code assumes a positive value).

But if the parser is expecting to see the metadata block, that means it thinks it has reached the end of the raw block. That can happen either due to (1) reading the amount of bytes specified in the raw UBJSON element length (the four bytes after $U#l), or else (2) reading a Game End event.

For an in-progress replay, Slippi initially sets the raw length to zero (since its size is not known in advance). That disables py-slippi's check for condition (1), so I'd normally suspect a premature Game End event. But there's enough weirdness going on here that I can't say that with any confidence.

It's hard for me to test without a Windows box, but I'll investigate more when I have more time.

yael333 commented 3 years ago

I'm not involved by any means in any Slippi development but I heard that they made the writing to a replay file asynchronous meaning real time data might come in a different order. They are now developing the broadcast mode which enables reading real time data more cleanly but didn't look into that yet.

Maybe you should look into that :3

dawsonbooth commented 3 years ago

I thought it would help if I added that I get a similar error when parsing a file. Here's the traceback:

>>> from slippi import Game
>>> Game("D:\\OneDrive\\Gaming\\Melee\\Slippi\\Error\\Game_20200615T235839.slp")
Traceback (most recent call last):
  File "C:\Users\micro\AppData\Local\pypoetry\Cache\virtualenvs\treefrog-jCddXGjI-py3.7\lib\site-packages\slippi\parse.py", line 212, in _parse_try
    _parse(input, handlers)
  File "C:\Users\micro\AppData\Local\pypoetry\Cache\virtualenvs\treefrog-jCddXGjI-py3.7\lib\site-packages\slippi\parse.py", line 193, in _parse    
    expect_bytes(b'U\x08metadata', stream)
  File "C:\Users\micro\AppData\Local\pypoetry\Cache\virtualenvs\treefrog-jCddXGjI-py3.7\lib\site-packages\slippi\util.py", line 55, in expect_bytes
    raise Exception(f'expected {expected_bytes}, but got: {read_bytes}')
Exception: expected b'U\x08metadata', but got: b'6\x02\x00\x01\x002\x01\x86L\xc3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\micro\AppData\Local\pypoetry\Cache\virtualenvs\treefrog-jCddXGjI-py3.7\lib\site-packages\slippi\game.py", line 36, in __init__
    ParseEvent.METADATA_RAW: lambda x: setattr(self, 'metadata_raw', x)})
  File "C:\Users\micro\AppData\Local\pypoetry\Cache\virtualenvs\treefrog-jCddXGjI-py3.7\lib\site-packages\slippi\parse.py", line 241, in parse
    _parse_open(pathlib.Path(input), handlers)
  File "C:\Users\micro\AppData\Local\pypoetry\Cache\virtualenvs\treefrog-jCddXGjI-py3.7\lib\site-packages\slippi\parse.py", line 231, in _parse_open
    _parse_try(f, handlers)
  File "C:\Users\micro\AppData\Local\pypoetry\Cache\virtualenvs\treefrog-jCddXGjI-py3.7\lib\site-packages\slippi\parse.py", line 226, in _parse_try
    raise e
slippi.parse.ParseError: Parse error (D:\OneDrive\Gaming\Melee\Slippi\Error\Game_20200615T235839.slp @0x27): [Errno None] None: 'D:\\OneDrive\\Gaming\\Melee\\Slippi\\Error\\Game_20200615T235839.slp'

It's worth noting that this is a pretty old game (June 2020), but other games around the same time work. In case it helps, here is a .zip with an example of either case.

DustinAlandzes commented 3 years ago

was looking at how slippi-js did this:

https://github.com/project-slippi/slippi-js/blob/master/slp/realtimeTest.slp

here's a test called "should support realtime parsing" which uses slp/realtimeTest.slp https://github.com/project-slippi/slippi-js/blob/57378dac5564fcfb8d10b25a693a9d411ee24261/test/game.spec.ts#L114-L169

the tests in this file use finalizedFrame.slp and sheik_vs_ics_yoshis.slp (an older SLP file) https://github.com/project-slippi/slippi-js/blob/master/slp/finalizedFrame.slp https://github.com/project-slippi/slippi-js/blob/master/slp/sheik_vs_ics_yoshis.slp

https://github.com/project-slippi/slippi-js/blob/master/test/realtime.spec.ts

https://github.com/project-slippi/slippi-js/blob/master/scripts/realtimeFileReads.js

Samoe97 commented 1 year ago

Hi there, just writing to say I'm experiencing the same problem with in-progress replay files. Exact same exception and error code with Exception: expected b'U\x08metadata', but got: b'6\x02\x00\x01\x002\x01\x86L\xc3'

I'm not experienced enough to solve this myself, but could it be that the slp format has changed substantially enough to break the parser's logic? It works well on completed replay files, I don't have any issues there, but I haven't successfully been able to use the Event Parser to read any in-progress replays.

This is the code that gets me these results:

directory = '/tests/Game_20221223T125106.slp'

from slippi.parse import parse
from slippi.parse import ParseEvent
handlers = {ParseEvent.START: print}
with open(directory, 'rb') as f :
    parse(f.raw, handlers)

I've tested it now with multiple in-progress files and used every ParseEvent type and it hasn't worked.

Kered13 commented 1 year ago

I happened to noticed this bug in the code while referencing PySlippi for my own C++ implementation. It's actually quite simple. For in-progress replays, the length field of raw is set to 0. In the _parse_events() loop, total_size == 0 is used to determine if a replay is in-progress. However _parse() passes length - bytesRead to _parse_events(), where bytesRead is the size of the event payloads block, always positive. Therefore for an in-progress replay _parse_events() is receiving a negative value for the total_size parameter.

The error message is a result of _parse_events() returning immediately because it thinks there are no more events to read, therefore _parse() tries to start reading the metadata section, but reads the game event instead.

The simplest fix is to change parse.py:124 to use total_size < 0 instead of total_size == 0. Another possible fix is to pass in-progress status explicitly as a boolean parameter.

It would probably be good to have a replay file for unit tests with 0 raw length to test the behavior of streaming replays as well. I have attached such a replay file. It is identical to game.slp, except that the raw length field has been modified to 0. I believe it will fail parsing on PySlippi, though I have not tested it (I tested it on my C++ implementation).

streaming_game.zip