configuration structure rework / alternative

sbaier1 commented 2 years ago

for my audio animation experiments, and i think also as a general purpose tool for animations that evolve, the current config syntax doesn't work very well:

scenes are of equal duration by default due to the steps_per_scene param
it feels a bit awkward to type long chains of scenes into a single string
it would be nice to be able to configure individual scenes with indvidual parameters as much as possible or as little as necessary depending on the particular scenario
for making multimodal animations in conjunction with other media such as video or audio, it's inconvenient to have to calculate the steps from fps + steps_per_frame

for these reasons, in my fork i refactored the configuration from using a single string for all configuration to specifying a singular scene's parameters as a complex object composed of its features:

for common features, i then set global vars that i substitute in individual scenes where they apply.

the exact parameters are up for debate still, ideally i'd like to (and i'm sure certain other users as well) have as much customizability per scene as possible (perhaps even different transformation functions per scene, etc.).

of course, the duration of a scene in seconds may be unintuitive or not what people want for their specific use case, which is why i'm unsure how to pull this into this project.

i'd like to discuss here if this is a viable thing to refactor or what the paths forward could be.

if a breaking change is not acceptable, it would also be possible to have a "either-or" scene format, with the old format still in place but optionally the new format can be used. this of course comes with additional complexity for the configuration itself.

dmarx commented 2 years ago

it would also be possible to have a "either-or" scene format

I definitely think this is preferable if possible. I think designing it with this in mind will also potentially make it easier to modularize the scene parsing in a way that could be easily incorporated/adopted by other projects in the future.

as a general topic of discussion: I'm super open to significanlty reworking the config system and scene specification system. I posted a demo config yaml in the vqlipse server a while back that illustrated a more flexible system. I was just posting it for feedback though so I never even began coding a parser for it. I'll dig it up and let me know if maybe a specification format like this might fit your use case

for common features, i then set global vars that i substitute in individual scenes where they apply.

the globals system we're currently using is part of why the codebase is as disorganized as it is. I'm hoping to eventually eliminate most of those globals in favor of something like a Renderer class which handles state management. A lot of what we're currently doing with globals could then be implemented as attributes on the Renderer object.

dmarx commented 2 years ago

hey y'all, just kicking around an idea. As of now, pytti and a few other notebooks have essentially started to evolve something of a "domain specific language" for authoring what pytti calls "scenes." That whole thing with specifying weights, and stop weights, and masks, and separating things with colons and pipes and underscores.... that's the thing I'm describing as a DSL.

I think this DSL adds a bit to the learning curve for a lot of people, making certain features difficult to use or completely inaccessible. I've been trying to think of a system that might feel more fluent and I've come up with something using yaml that I think is potentially more expressive as well.

I haven't actually implemented anything here, just kicking around design ideas. Here's a sample that illustrates the kind of system I have in mind. Interested to hear what people think.

scenes:
  prefixes:
    - a photo of
    - a photorealistic rendering of
  suffixes:
    - ultra-high resolution 8k UHD
    - trending on artstation
    - modeled in blender
  frames: 10
  steps_per_frame: 50
  transform:
    2d:
      zoom: -10
- scene:
    prompt: a day at the beach
    frames: 20
      transform:
        3d:
          rotate: [.1, .2, .3, .4]
- scene:
    steps_per_frame: 80
    prompt: a wintery landscape
    prompt: snow-covered mountains
      weight: .5
    # prefer snow-covered trees to empty white space
    prompt: a snowy forest
      weight: 1.5
      semantic_mask:
        prompt: a snowy field
        prompt: an empty field covered in snow
- scene:
    prompt: a barren landscape
    prompt: an apocalyptic wasteland
    prompt: a long abandoned industrial wasteland
    prompt: the decaying remains of an advanced civilization
    prompt: alien ruins
    transform:
      2d:
        zoom: -10 + 5*t
init_image: someurl.com/myimage.png
models:
  crossmodal_image_text:
    - clip: ViTB16
    - clip: RN50x4
  depth:
    - adabins: adabins_nyu
  flow:
    - GMA: gma
  image_model:
    - VQGAN: ImageNet

https://discord.com/channels/869630568818696202/899135695677968474/943391522353721384

pytti-tools / pytti-core

configuration structure rework / alternative #125