aardschok commented 7 years ago

Issue

When collecting image sequences through Avalon to publish, the data which is parsed to the new instance is not unique and might create conflicting data.

Example

The render has multiple AOVs or Render Elements per layer. This results in :

TECH.normal.####.ext
TECH.zdepth.####.ext
CHARS.objectId.####.ext
CHARS.materialID.####.ext
CHARS.beauty.####.ext

The collector will create an instance for each of those, perfect ... but! The data being parsed is exactly the same from layer metadata. When the mutable data needs to be altered post-instance-creation this can result in TECH.zdepth getting information which should only be on TECH.normal. The same goes for all other AOVs/Render Elements for each layer.

Original code

https://github.com/mindbender-studio/config/blob/master/polly/plugins/publish/collect_imagesequences.py#L47

Solution

To ensure all instances have the correct metadata we suggest to use copy.deepcopy(metadata["instance"]).

The code would then look something link this:

metadat_instance = metadata['instance']
# For now ensure this data is ignored
for collection in collections:
    instance = context.create_instance(str(collection))
    self.log.info("Collection: %s" % list(collection))

    # Ensure each instance gets its own unique reference to
    # the source data
    instance_metadata = copy.deepcopy(metadat_instance)

    data = dict(instance_metadata, **{
        "name": instance.name,
        "family": "Image Sequences",
        "families": ["colorbleed.imagesequence"],
        "subset": collection.head[:-1],
        "stagingDir": os.path.join(workspace, renderlayer),
        "files": [list(collection)],
        "metadata": metadata
    })

    instance.data.update(data)

aardschok commented 7 years ago

This issue/conversation might be more suitable in github/avalon

We are currently testing a new approach on publishing image sequences. This is based on the JSON files which are written out when submitting to Deadline.

Concept

To ensure we are collecting all image sequences of a given layer we force the Collector to use specific folder based on the path of the JSON file. This ensures us we only publish the AOVs/Render Elements of that render layer. This approach is used when publishin the image sequences through Deadline

Here is some pseudo code

USE_JSON = os.environ.get("USE_JSON", "")
if USE_JSON:
    json_path = USE_JSON
    workspace = os.path.dirname(json_path)
    base = workspace
    dirs = [os.path.splitext(os.path.basename(json_path))[0]]
    # Else use the current working directory
else:
    workspace = context.data["workspaceDir"]
    base, dirs, _ = next(os.walk(workspace))

mottosso commented 7 years ago

For completeness, this is where the contents of the metadata variable is coming from. I had trouble remembering what it was at first.

To ensure all instances have the correct metadata we suggest to use copy.deepcopy(metadata["instance"]).

Yes, that makes sense. Though I might actually be inclined to go the other way; rather than embedding the entire instance as metadata, to only pick relevant members. I initially did that because I didn't know what was required (as I have little experience with rendering), but now that experience exists, have you got an idea of what data you need from the variable? Any? All?

aardschok commented 7 years ago

@BigRoy and I went over the minimal requirements were :

name
subset (byFrameStep)
Optional: substeps, I'm assuming this might be handy when locking motion blur samples?

The families is already hardcoded in Collect Image Sequences to <prefix>.imagesequence. The representation is based on the rendered images themselves. The statingDir is redundant with image sequences as we use instance.data["files"] The start and endframe can be taken from the first and last file of the instance.data["files"]

mottosso commented 7 years ago

That's perfect. In that case, I'd explicitly either (1) write those out on publish, or (2) explicitly read those in the collect_imagesequences plug-in.

What do you think?

My worry about deepcopy is that we one day find data to embed that doesn't go well with deepcopying. Like Python objects, pickled objects, or third-party wrappers (Qt things, who knows). It'll keep increasing in size and throw errors at hard-to-debug times.

BigRoy commented 7 years ago

The start and endframe can be taken from the first and last file of the instance.data["files"]

Actually, maybe. You'd still need the original .json file to hold that information and load it in to ensure whether the files on disk actually are saved for all frames. So you'll need that information.

Regarding byFrameStep it's the same. You only need that for validations, it would just mean you'd need to check if the frames are consistent regarding the input rules. That only file 1, 4, 7, etc. is present when byFrameStep was 3.

But yes, writing and reading more explicitly should make this much easier - where at read time it might be more useful, so if ever a list has to be read you can ensure that they are unique to each instance as you apply them.

mindbender-studio / config

Potential conflict collecting and publishing image sequences #12

Issue

Example

Original code

Solution

Concept