quantified-uncertainty / squiggle

An estimation language
https://squiggle-language.com
MIT License
148 stars 22 forks source link

Save ImportModelIds to ModelRevision #3161

Open OAGr opened 3 months ago

OAGr commented 3 months ago

Description of suggestion or shortcoming:

When we save ModelRevisions, we could check the AST in order to find the mode IDs of all of the imports. This will be useful for displaying all of the dependencies of any model, which might be desired soon (like, for Specs).

Later, it would be good to show this list somewhere in the Model page. Or maybe even the model card.

This is a many-to-many relationship, so might need a new model / DB table. I propose:

model ModelImport {
  id String @id @default(cuid())

  modelRevision   ModelRevision @relation(fields: [modelRevisionId], references: [id], onDelete: Cascade)
  modelRevisionId String
  importedModel   Model         @relation(fields: [importedModelId], references: [id], onDelete: Cascade)
  importedModelId String

  @@unique([modelRevisionId, importedModelId])
  @@index([modelRevisionId])
  @@index([importedModelId])
}

It might also be wise to rename ModelExport to VariableExportRevision, at the same time, to decrease confusion.

OAGr commented 3 months ago

@berekuk would be curious to get your take some time. This seems pretty straightforward to me, though we might have to adjust the DB a bit - maybe with a join table (many to many relationship).

berekuk commented 3 months ago

I'll copy my Discord comment here:

  • it seems like we'll have to pin second-order revisions too, somehow, otherwise we don't get full immutability
  • I guess by separating "revisions" from "revisions with dependency pins"?
  • like, if I save a model, it produces a revision
  • but then it also produces a "revision-with-dependency-pins" that depends on other "revision-with-dependency-pins", and the auto-upgrade produces another "revision-with-dependency-pins" with updated pins
  • the "output(revision)" is mutable, but "output(revision-with-dependency-pins)" is not
  • we could mix them together, but that would lead to too many duplicated automatically created revisions (with copy-pasted code and other fields - not very normalized)
  • nice thing is that revision-with-dependency-pins and ModelRevisionBuilds are one-to-one - we'll never have to run the build more than once

So the schema would be this:

classDiagram
note for ModelRevision "Created on save"
ModelRevision "1" --> "*" ModelRevisionWithImportPins
note for ModelRevisionWithImportPins "Created on save.\nAfter that, new rows pointing to the same ModelRevision are created by auto-upgrade process."
ModelRevisionWithImportPins "1" --> "*" ModelRevisionWithImportPins : Import, stored as another relation not shown on this diagram.
ModelRevisionWithImportPins "1" --> "1" ModelBuild :Produced by server build, potentially stored on S3/Elastic.

It should be possible to derive ModelRevisionWithImportPins by analyzing AST only, so ModelBuild values could be produced asynchronously.

We can send the initial import pins from the client, so that the import versions that the user have seen in the playground will match the actual DB state.

On load, we'll go ModelRevision -> latest ModelRevisionWithImportPins -> send code for all imports. Fetching the imports code can be done in linker (so the backend only sends import pins data, then the linker queries the backend again for the actual import code from necessary revisions and recursively for their own import pins, etc.), or maybe on the backend initially, as an optimization, and then in linker on interactions.