lk-geimfari / mimesis

Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
https://mimesis.name
MIT License
4.39k stars 330 forks source link

Schema Generation from JSON #1357

Closed NoiSek closed 1 year ago

NoiSek commented 1 year ago

Feature request

Thanks for making Mimesis! As a developer who wants to generate high quality synthetic data for testing and development, it is a godsend.

What I would like to be able to do, however, is define the data shapes I use as standard OpenAPI schema objects and load them into Mimesis. The Schema related functions of Mimesis are great, but cannot be reused in any other context like OpenAPI definitions can.

Thesis

Proposed Mimesis-enabled schema:

{
  "title": "Organization",
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "description": "Company display name.",
      "x-mimesis": "Finance.company"
    },
    "address": {
      "type": "string",
      "description": "Official mailing address.",
      "x-mimesis": {
        "provider": {
          "name": "Address",
          "args": {
            "locale": {
              "Locale.EN"
            }
          },
          "kind": "address"
        }
      },
    },
    "phoneNumber": {
      "type": "string",
      "description": "Primary contact number.",
      "x-mimesis": "Person.phone_number"
    },
    "users": {
      "type": "array",
      "description": "List of users belonging to this organization."
      "items": { "$ref": "#/components/schemas/Customer" }
    }
  }
}

Parsing JSON / OpenAPI schema into Mimesis in this fashion should be relatively trivial.

Note: It is entirely possible to generate files like this from Mimesis' Schema objects as well, but I don't think the usecase or the added complexity justifies the implementation cost.

Reasoning

Allowing developers to define schemas in this way makes building a library of common, reusable schema objects possible while still leveraging the data generation capabilities of Mimesis. Schema objects serve much better as reusable sources of truth, and many companies will already have them defined.

Similar ideas exist that leverage faker.js and Chance: https://npm.io/package/openapi-mock-generator

NoiSek commented 1 year ago

( I am willing to implement this if the maintainers agree that this is a good idea )

lk-geimfari commented 1 year ago

@NoiSek Hi. You can try, but this feature will require a huge amount of work.

NoiSek commented 1 year ago

I managed a prototype of this with relatively little code (100 lines or so) actually. I suspect the real difficulty will be in writing unit tests to cover edge cases.

Supporting even $ref values recursively is also fairly straightforward-- I would say there a lot of potential here 😁

Only caveat: if you're generating data like this you probably want your data to be at least semi related and work additively off of already-generated values, but I think that's out of the scope of Mimesis itself and probably over-reaching.