wandb / weave

Weave is a toolkit for developing AI-powered applications, built by Weights & Biases.
https://wandb.me/weave
Apache License 2.0
704 stars 64 forks source link

[Question] Is there a way to turn weave objects (tracetables, boxed types, etc....) back into the original python objects? #1602

Open darinkishore opened 6 months ago

darinkishore commented 6 months ago

Hi! So I have a weave object, initialized as follows:

class SemanticMemoryExample(BaseModel):
    name: str
    text: str
    memory: str
    inputs: list[str] = ["name", "text"]

class SemanticMemoryExampleDataset(weave.Object, BaseModel):
    name: str = "semantic_memory_perltmem_dspy_first_examples"
    description: str = "First foray into semantic memory using DSPY"
    examples: list[SemanticMemoryExample]

The wrapper "SemanticMemoryExampleDataset" is being used because name is an already defined attribute for weave.Object.

I'd like to save and load this object, so I run

def publish_dataset(
    examples: list[SemanticMemoryExample],
) -> SemanticMemoryExampleDataset:
    dataset = SemanticMemoryExampleDataset(examples=examples)
    weave.publish(dataset)
    return dataset
def retrieve_examples(
    dataset_ref: SemanticMemoryExampleDataset,
) -> list[SemanticMemoryExample]:
    retrieved_examples: list[SemanticMemoryExample] = []
    for example in dataset_ref.examples:
        retrieved_examples.append(example)  # here, this doesn't go recursively

    return retrieved_examples

However, retrieve_examples returns

TraceObject(ObjectRecord({'name': BoxedStr('...'), 'text': BoxedStr("...."), 'memory': BoxedStr('...'), 'inputs': TraceList(['name', 'text']), '_class_name': 'SemanticMemoryExample', '_bases': TraceList(['BaseModel']), 'map_values': <bound method ObjectRecord.map_values of ObjectRecord({...})>}))

I see that Boxed objects have an unbox() method, so I can unbox the name, text, and memory by calling from weave.box import unbox and unbox(thing) for thing in list

But I don't see a way to convert inputs, a TraceList back into a native python datatype. Also, the manually converting everything is a hassle—is there a weave function planned (or already existing) that turns the TraceList, TraceTable, etc... objects back into native python datatypes?

Loving the library—there were a LOT of good decisions made as far as what to focus on and DX. This is an almost ideal solution for me.

darinkishore commented 6 months ago

I'm being silly—You can just cast it back into a list, inputs=list(example.inputs).

jwlee64 commented 6 months ago

Hi @darinkishore just wanted to confirm that all is good here and that we can close this?

darinkishore commented 6 months ago

Hi! Thank you for checking—My main question is still unresolved!

Can you turn weave objects back into their native python objects?

Usually to preserve state, some classes can't set everything up at creation time!

Also, the changed type of all inside attributes is inconvenient to keep track of and work around in code, especially if I use lots of different weave objects.

tssweeney commented 5 months ago

Hi @darinkishore - thank you very much for your feedback and comments. Your request is very reasonable. Reading your use case, I am extracting 3 distinct asks:

  1. The ability to construct the original runtime class when loading published data
  2. Ideally Trace*, *Record, and Boxed* type classes are transparent to the user as they deviate from the expected types in code (at the very least it should be easy to recursively strip away this representation)
  3. (Implied from first comment): The special name field in our Object class can conflict with user-defined fields.

Spitballing some API ideas: I wonder if there could be a higher level class method that could make this easier (some pseudo code):

class Object():
   # ...

   @classmethod
   def load(cls, data: "Object" | dict | TraceObject) -> "Object":
      """
      """
      if isinstance(data, cls):
         return data
      elif isinstance(data, dict):
         return cls.model_validate(dict)
      elif isinstance(data, TraceObject):
         return cls.load(weave.unwrap(data))
      else:
          raise

this would allow you to run SemanticMemoryExampleDataset.load(...) to ensure you have the right class.


In any case, these are great requests and we need to think about a good design to improve this. Probably need to come back with more ideas/options before taking action

tssweeney commented 5 months ago

Internal backlog link: https://wandb.atlassian.net/browse/WB-18889