rstudio / vetiver-python

Version, share, deploy, and monitor models.
https://rstudio.github.io/vetiver-python/stable/
MIT License
59 stars 17 forks source link

add `/GET` prototype endpoint #174

Closed isabelizimm closed 1 year ago

isabelizimm commented 1 year ago

Need to fix up some tests, but adding the ability to look at prototype from API endpoint

juliasilge commented 1 year ago

I would like to make something that is pretty unified between R and Python for this, because this is a part of dealing with deployed models that it would be reasonable to want to do from both languages ("for model X, what is the input data prototype?"), or even just using curl or another way to query a REST API.

Given that, this seems awkwardly deeply nested to me:

{
  "title": "my_model prototype",
  "$ref": "#/definitions/prototype",
  "definitions": {
    "prototype": {
      "title": "prototype",
      "type": "object",
      "properties": {
        "B": {
          "title": "B",
          "default": 6,
          "type": "integer"
        },
        "C": {
          "title": "C",
          "default": 10,
          "type": "integer"
        },
        "D": {
          "title": "D",
          "default": 8,
          "type": "integer"
        }
      }
    }
  }
}

My initial inclination is that something more like this would be more usable generally:

{
  "title": "my_model prototype",
  "prototype": {
    "B": {
      "title": "B",
      "default": 6,
      "type": "integer"
    },
    "C": {
      "title": "C",
      "default": 10,
      "type": "integer"
    },
    "D": {
      "title": "D",
      "default": 8,
      "type": "integer"
    }
  }
}

How doable would it be to go from something like that to a Pydantic schema or base model?

isabelizimm commented 1 year ago

Going from BaseModel -> JSON is pretty simple. Instead of getting a full schema, I can use

v.prototype.schema()

to receive

{
        "properties": {
            "B": {"title": "B", "default": 88, "type": "integer"},
            "C": {"title": "C", "default": 67, "type": "integer"},
            "D": {"title": "D", "default": 28, "type": "integer"},
        },
        "title": "prototype",
        "type": "object",
    }

which feels much better!

The idea of a round trip json schema is not as easy as I thought. It's not currently available inside of pydantic. There is a project for creating BaseModels from JSON, but the approach involves reading/writing Python files, which feels a little clunky in this use case.

My current plan is to write a helper function to parse out the JSON into a typed dict, which IS an accepted input class for a BaseModel.

juliasilge commented 1 year ago

That example looks way better, and your plan to make a little helper function sounds like the right move to me. 👍

juliasilge commented 1 year ago

I was working on this in a new package: https://github.com/juliasilge/cereal

Here is what I have going so far:

{
    "title": "my_model prototype",
    "properties": {
        "a": {
            "class": "double",
            "details": []
        },
        "b": {
            "class": "integer",
            "details": []
        },
        "c": {
            "class": "Date",
            "details": []
        },
        "d": {
            "class": "POSIXct",
            "details": {
                "tzone": "America/New_York"
            }
        },
        "e": {
            "class": "character",
            "details": []
        },
        "f": {
            "class": "factor",
            "details": {
                "levels": ["blue", "green", "red"]
            }
        },
        "g": {
            "class": "ordered",
            "details": {
                "levels": ["small", "medium", "large"]
            }
        }
    }
}
isabelizimm commented 1 year ago
  • I can definitely change to "type" instead of "class".

At least for the Python side, I do think "type" should keep the name.

  • It may make sense for the details element to not exist in Python, given that there is less use of complex data types. What do you think?

I could add the details if the type is categorical/timedelta/etc, but that would all probably have to be written by hand. It's doable, but likely not worth it for the first iteration.

  • What do you think about not having the "title" info duplicated in every element? Is that doable?

It's possible to pop "title" from each element! Since there will need to be a helper function, I can remove that from the typed dict.

isabelizimm commented 1 year ago

What would a more complex example look like? ie, BaseModel w/ only integers btwn 0-5

juliasilge commented 1 year ago

I've got what will be at /prototype from R looking like this now:

{
  "title": "my_model prototype",
  "properties": {
    "a": {
      "type": "numeric",
      "default": "1",
      "details": []
    },
    "b": {
      "type": "integer",
      "default": "2",
      "details": []
    },
    "c": {
      "type": "Date",
      "default": "2023-01-01",
      "details": []
    },
    "d": {
      "type": "POSIXct",
      "default": "2019-01-01",
      "details": {
        "tzone": "America/New_York"
      }
    },
    "e": {
      "type": "character",
      "default": "x",
      "details": []
    },
    "f": {
      "type": "factor",
      "default": "blue",
      "details": {
        "levels": ["blue", "green", "red"]
      }
    },
    "g": {
      "type": "ordered",
      "default": "small",
      "details": {
        "levels": ["small", "medium", "large"]
      }
    }
  }
}
juliasilge commented 1 year ago

Do you know why the name for the example/default field ends up "default"? I was looking at the OpenAPI specification again, and it looks like it would be better to call it "example".

juliasilge commented 1 year ago

I'm realizing as I work on the workshop materials that this has not been merged in so Python models don't have this endpoint. What still needs to be done to finish this up?

isabelizimm commented 1 year ago

What still needs to be done to finish this up?

I think whatever helpers to clean up the prototype , and then a bit of refactoring in the tests.

isabelizimm commented 1 year ago

we are still using default over example, but let's handle that in a different PR since it will include some plumbing in the prototype creation. trying hard to stay pydantic version agnostic 🤞

isabelizimm commented 1 year ago

After looking at the options, a more nested approach better respects prototypes made from pydantic.BaseModel objects since they often add extra data to the outer levels. These fields allow us the opportunity to add round trip support for very advanced prototypes in the future. @juliasilge WDYT?

juliasilge commented 1 year ago

Let's prioritize the experience of round-tripping a prototype for Python the way we can round-trip a prototype for R over having /prototype endpoints that look the same in Python and R. I think that's a reasonable choice here. 👍