FileSequence can serialised and deserialised into different sequences

walrusVision commented 7 years ago

Depending on how you construct a FileSequence you can generate a sequence that cannot correctly deserialised into an equivalent object.

For example

>>> import fileseq
>>> fs = fileseq.FileSequence("myfile01.ext")
>>> fs.basename()
'myfile01'
>>> fs.setFrameRange("10-20")
>>> fs.start()
10
>>> fs.end()
20
>>> fs.setPadding("#")
>>> str(fs)
'myfile0110-20#.ext'
>>> fs2 = fileseq.FileSequence(str(fs))
>>> str(fs)
'myfile0110-20#.ext'
>>> fs2.basename()
'myfile'
>>> fs2.start()
110
>>> fs2.end()
20
>>> str(fs) == str(fs2)
True
>>> fs.frameSet() == fs2.frameSet()
False

Since we've made the changes made for #47, it should only effect sequences where the basename ends in a digit or a dash.

Given there is a number of entry points for setting values, and a number of exist points for formatting I'm not sure the best place to enforce validation.

justinfx commented 7 years ago

If you are performing your serialize/deserialize by way of using the FileSequence constructor, then the outcome is valid because you are putting the string through another parsing phase. cPickle for instance will make use of saving and restoring the __dict__ for the instance so you end up with the correct results. For other serializations such as json, it isn't supported. You could also manually get and set the __dict__ for other types of serialization.

This works:

import cPickle

cPickle.loads(cPickle.dumps(fs)).frameSet() == fs.frameSet()

justinfx commented 7 years ago

Is this still an issue? If not, @walrusVision can you explain what behaviour you would expect in relation to serialization/deserialization? It should be consistent to cPickle to serialize a sequence and get back the same result. But manually calling setters vs using the constructors parser logic may not result in the same outcome.

GlenWalker commented 7 years ago

I would say that serialization/deserialization wasn't the clearest way to express the problem - the problem is that fileseq allows the construction of ambiguous file sequences, where the apparent frame number of any files created by iterating the fileseq.FileSequence will not match their actual frame number. For example

>>> import fileseq
>>> fs = fileseq.FileSequence("/path/to/file1.ext")
>>> fs.setFrameRange("10-20")
>>> fs.setPadding("#")
>>> str(fs)
'/path/to/file110-20#.ext'
>>> fs.basename()
'file1'
>>> fs.start()
10
>>> fs.end()
20
>>> list(fs)
['/path/to/file10010#.ext', '/path/to/file10011#.ext', '/path/to/file10010#.ext', '/path/to/file10013#.ext', '/path/to/file10014#.ext', '/path/to/file10015#.ext', '/path/to/file10016#.ext', '/path/to/file10017#.ext', '/path/to/file10018#.ext', '/path/to/file10019#.ext', '/path/to/file10020#.ext']
>>> fs2 =  fileseq.FileSequence(str(fs))
>>> str(fs2)
'/path/to/file110-20#.ext'
>>> fs2.basename()
'file'
>>> fs2.start()
110
>>> fs2.end()
20
>>> list(fs)
['/path/to/file0110.ext', '/path/to/file0109.ext', '/path/to/file0108.ext', '/path/to/file0107.ext', 
...
'/path/to/file0023.ext', '/path/to/file0022.ext', '/path/to/file0021.ext', '/path/to/file0020.ext']

Problems:

The file list produced by iterating fs appears to have frame numbers from 10010-10020 when they were really frames 10-20
It is possible to have another sequence with an identical string representation that produces a different list of files
It is not possible to construct fs1 from a string, without manually setting the range and padding

From our perspective it would be ideal if fileseq would raise some kind of exception when code attempts to produce a string representation of, or iterate over, an ambiguous file sequence - to avoid problems like those above, preventing pipeline code from creating files with ambiguous frame numbers in their name. I can see why you wouldn't want to do this though, so for now we are raising errors for this in our own code. Have a think about ways to handle this nicely, and feel free to close if you don't think fileseq should be involved.

An ambiguous file sequence is simply one where the basename ends in a digit or a -, and the frame range is not empty.

justinfx commented 7 years ago

Thanks for clarifying the situation. I can see how that would be a problem if a user was then able to treat that as a string representation which no longer actually worked properly for a sequence they have produced. I will think on this a bit, since it would introduce an exception for previously valid code.

GlenWalker commented 7 years ago

FWIW, we log instances where we are passed arguments that create an ambiguous file sequences, and follow up with the user. We've had one case dealt with so far, and it turned out the user really only needed to create a single file, and not a sequence, so they just changed their code.

We have a few more cases to follow up, but it looks like they are the same error - we can see the user writing to a single file with no frame number, while the file sequence by default creates a file with a frame number added.

For example:

>>> import fileseq
>>> fs = fileseq.FileSequence("/path/to/file2048x2048.ext")
>>> fs.setFrameRange("1-1")
>>> fs.setPadding("#")
>>> list(fs)
['/path/to/file2048x20480001.ext']

and then the user actually writes their data into /path/to/file2048x2048.ext and really didn't need to create a sequence anyway.

justinfx / fileseq

FileSequence can serialised and deserialised into different sequences #48