prisma / prisma1

💾 Database Tools incl. ORM, Migrations and Admin UI (Postgres, MySQL & MongoDB) [deprecated]
https://v1.prisma.io/docs/
Apache License 2.0
16.55k stars 869 forks source link

The MongoDB connector should support the `prisma introspect` command #3529

Closed nikolasburk closed 5 years ago

nikolasburk commented 5 years ago

When using the MongoDB connector with an existing database, I currently need to model my data by hand. It would be great if the prisma introspect command could help me with this by sampling a number of documents from the collections inside my database and suggest a datamodel based on that.

ejoebstl commented 5 years ago

Proposed design goals

The introspection process ...

Proposed tasks

All mentioned Todos only affect the CLI component.

First, some cleanup should be done. This is optional but recommended, to avoid duplication.

After this, we can implement the MongoDB connector:

Notes on Sampling:

1: Samples N random documents from each collection and tries to find a useful intersection.

Multiple samples are merged. Fields that are found in all samples are made required, fields that are found in some samples are made optional.

Notes on Resolving Relations

2: For fields of type UUID or ObjectID, performs an index lookup on all collections to guess if a relation exists. Alternatives/possible additions would be: Guessing relations by name, or to include other types to the lookup as well.

Relations for embedded documents are recursively resolved.

Handled corner cases

Unhandled corner cases

If we encounter such a case, we abort.

Dependencies

Existing databases might have embedded types without any _id field. Related to #3575.

ejoebstl commented 5 years ago

This PR implements an alpha version of Mongo introspection. I've abstracted the concept of document databases, so it should be super easy (150 LOC) to add other document databases in the future. Schema rendering is now a completely independent module in prisma-datamodel.

Resolving Behavior

The default behavior is: Sample one element from each collection to infer a flat schema, then do a lookup of all fields of 50 randomly selected items to find relations.

Let's see how well this works with real-world data.

For now, we try to infer relations on all ObjectID and string fields. I'll test that with real-world data.

Open Todos

ejoebstl commented 5 years ago

There are currently the following open questions for this feature. I suggest we wait for input of some users who tried the new beta release to answer this questions:

nikolasburk commented 5 years ago

I just tested the introspection with this data that was structured according to this datamodel:

type User @db(name: "users") {
  id: ID! @id
  email: String @unique
  name: String!
  posts: [Post!]! @relation(link: INLINE)
}

type Post @db(name: "posts") {
  id: ID! @id
  wasCreated: DateTime! @createdAt
  wasUpdated: DateTime! @updatedAt
  title: String!
  published: Boolean @default(value: false)
  author: User
  comments: [Comment!]!
}

type Comment @embedded {
  text: String!
  writtenBy: User!
}

This was the output that was generated:

type posts {
  _id: ID! @id
  published: Boolean
  title: String
  wasCreated: postsWasCreated
  wasUpdated: postsWasUpdated
}

# type postsWasCreated @embedded {

# }

# type postsWasUpdated @embedded {

# }

# type User {

# }

type users {
  _id: ID! @id
  email: String
  name: String
  posts: [ID!]!
}

EDIT: Note that the dataset I used was extremely small:

See data Data: ![image](https://user-images.githubusercontent.com/4058327/49940221-3d099380-fedf-11e8-9639-7c6cb4e8c76a.png) ![image](https://user-images.githubusercontent.com/4058327/49940236-472b9200-fedf-11e8-8886-cd63bbe7cdb4.png)
nikolasburk commented 5 years ago

One general consideration might be that we generate model names that follow the Prisma conventions, i.e. start with uppercase letter and use singular version and use the @db directive to map to the underlying collection.

I opened an issue for this: https://github.com/prisma/prisma/issues/3702

ejoebstl commented 5 years ago

Thank you for the input. I will look into the relation issue immideately. Can you PM me the data as JSON or PM me credentials for the database?

Regarding the naming: Great idea! To respect prisma conventions, we should singularize type names. Is there any reference for this in prisma so far? Otherwise I can just do something trivial, like trimming trailing ses.

nikolasburk commented 5 years ago

We have some scarce docs for naming conventions (it actually doesn't mention the uppercasing of models) here. The data was produced using the following three mutations:

Create two new users

mutation {
  user1: createUser(data: {
    email: "alice@prisma.io"
    name: "Alice"
    posts: {
      create: {
        title: "Join us for GraphQL Conf 2019 in Berlin"
        published: true
      }
    }
  }) {
    id
  }

  user2: createUser(data: {
    email: "bob@prisma.io"
    name: "Bob"
    posts: {
      create: [{
        title: "Subscribe to GraphQL Weekly for community news"
        published: true
      } {
        title: "Follow Prisma on Twitter"
      }]
    }
  }) {
    id
  }
}

Add comments to two posts from Bob (send twice)

mutation {
  updatePost(
    where: {
      id: "__ID_FROM_BOBS_POST__"
    }
    data: {
      comments: {
         create: [{
          text: "Love it 👏"
          writtenBy: {
            connect: {
              email: "alice@prisma.io"
            }
          }
        }]
      }
    }
  ) {
    id
  }
}
ejoebstl commented 5 years ago

The problem can be split into the following issues:

pantharshit00 commented 5 years ago

I tried the mongo introspection with a customer. The database has many nested embedded fields. The introspection result has many errors(Especially look TenantStoreDataMappingMappedColumns which had some interesting results): https://pastebin.com/E3UPd8J6

Here is a sample document from the database: https://pastebin.com/Nxgp3YEq. The DB had around 748 records and introspection took a while so he reported this at first place because he though introspection was not working. So I would also suggest adding a progress bar in the future.

Even when I corrected and deployed the datamodel manually I got an error saying Prisma can't handle ObjectId('....').

I used prisma version 1.24-beta.

ejoebstl commented 5 years ago

Thanks for your bug report - it's super helpful to have some real-world examples.

Are there other documents in the database that look differently? I don't think the introspection would generate a TenantStoreDataMappingMappedColumns type at all from the given sample.

If so it would be incredibly helpful if you could share those with us.

pantharshit00 commented 5 years ago

Unfortunately I had a pretty limited access so I was only able to extract the above info. The tenant document is the only thing that he gave me as an example. I will try to extract more info from him though.

pantharshit00 commented 5 years ago

I think we support this now. Any other introspection related bug reports should be separate now.

Closing :)