rsgalloway / pyseq

Compressed sequence string module for Python
https://pyseq.rsgalloway.com/
Other
123 stars 36 forks source link

better handling of sequences with many missing frames #68

Closed rsgalloway closed 2 years ago

rsgalloway commented 2 years ago

Currently, pyseq could hang if it encounters a sequence with many (millions) of missing frames, or random files that mimic sequences where the gap in frames is large (>1M). The issue is in _get_missing() and eats lots of memory.

See issue #67 and test_issue_67 in the unit tests for examples.

The issue is easily reproducible, e.g.:

>>> seqs = get_sequences(["image-00000001.jpg", "image-50000000.jpg"])
>>> print(seqs)

The solution here addresses this problem by setting an upper limit on frame sequence sizes to 100K when calculating missing frames (which I would assume would handle >99.9% of use cases: 100K frames is >1 hour).

Sequences with more than 100K frames will now return range values instead of explicit frame number when calculating missing frames. This seems to work fine when printing compressed sequence strings,

>>> print(get_sequences(["image.001.jpg", "image.100.jpg"])[0].format("%M"))
[2-99]
>>> print(get_sequences(["image.0000001.jpg", "image.1000000.jpg"])[0].format("%M"))
[2-999999, ]

but since the return type could be different in these cases I bumped the minor version.

rsgalloway commented 2 years ago

Adding a few others to review...

@nebukadhezer @broganross @johannes @rossb-dlx