Closed mboisnard closed 6 months ago
Hi @mboisnard ,
Thank you for the suggestion. This is something that's been on the back of my head for awhile now :) Along with generating generating csv, json, ...
If you like to work on this - please let me know. Else I'll try to prioritize this myself in the near future to make it available for the next release.
UPD: Oh wait, now that I looked at the links you've provided, I think I misunderstood what you meant by a "Database Provider" :D I was thinking of an interface that can be used to automatically populate a db table, for example.
As for a Database Provider that you meant. I can see that it could be useful, definitely. But what they have in fakerjs seems quite specific and narrow, which I don't think is good enough for a more "generic use-case" For example, if we take database collation, which db implementation are we talking about? The implementation in fakerjs doesn't seem to take that into account (e.g. https://github.com/faker-js/faker/blob/c1caa900ceb12737a3aa45b7e4dd75797a11a889/src/locales/base/database/collation.ts ) Column data types also vary from one db to another. Postgres doesn't have a storage engine like mysql, for example. And so on.
If you'd like to provide a "Database Provider" yml file that contains such information for various database implementations - please do so and I'll be happy to include this :)
For example, this is what chatgpt gave me:
postgresql:
column:
- id
- name
- email
- created_at
- updated_at
type:
- INTEGER
- BIGINT
- DECIMAL
- NUMERIC
- REAL
- DOUBLE PRECISION
- SERIAL
- BIGSERIAL
- CHAR
- VARCHAR
- TEXT
- DATE
- TIMESTAMP
- TIMESTAMP WITH TIME ZONE
- BOOLEAN
- JSON
- JSONB
- BYTEA
- ARRAY
- UUID
- ENUM
engine: []
collation:
- "en_US.UTF-8"
- "en_GB.UTF-8"
- "de_DE.UTF-8"
- "fr_FR.UTF-8"
mysql:
column:
- id
- name
- email
- created_at
- updated_at
type:
- INT
- BIGINT
- DECIMAL
- FLOAT
- DOUBLE
- CHAR
- VARCHAR
- TEXT
- DATE
- DATETIME
- TIMESTAMP
- TIME
- YEAR
- BOOLEAN
- JSON
- BINARY
- VARBINARY
- BLOB
- ENUM
- SET
engine:
- InnoDB
- MyISAM
- MEMORY
- CSV
- ARCHIVE
- BLACKHOLE
- MERGE
- FEDERATED
collation:
- "utf8mb4_general_ci"
- "utf8mb4_unicode_ci"
- "latin1_swedish_ci"
- "latin1_general_ci"
mariadb:
column:
- id
- name
- email
- created_at
- updated_at
type:
- INT
- BIGINT
- DECIMAL
- FLOAT
- DOUBLE
- CHAR
- VARCHAR
- TEXT
- DATE
- DATETIME
- TIMESTAMP
- TIME
- YEAR
- BOOLEAN
- JSON
- BINARY
- VARBINARY
- BLOB
- ENUM
- SET
engine:
- InnoDB
- MyISAM
- Aria
- MEMORY
- CSV
- ARCHIVE
- BLACKHOLE
- MERGE
- FEDERATED
- TokuDB
- Spider
collation:
- "utf8mb4_general_ci"
- "utf8mb4_unicode_ci"
- "latin1_swedish_ci"
- "latin1_general_ci"
Is it comprehensive and accurate enough? I'm really not sure :D It's a start though, but I don't know if it's good enough so to speak.
Additionally, just in case you have a very specific use-case, I'd recommend you to take a look at creating your own data providers docs. This functionality is available since version 2.0.0-rc.1
and allows you to extend faker implementation and create your own data providers ;)
I'll still keep this issue open in case you or anyone else wants to work on this. Seems like a good "first issue" :)
Hello @serpro69 , thanks for your answer.
Yes actually I was talking about the same behavior as faker-js and I completely agree with you that the current implementation is generic and can be improved to match the possible data for each database.
I will take a look at your documentation, and try to contribute to the project :)
Contributions are always welcome :) Thanks!
I think this https://github.com/serpro69/kotlin-faker/blob/master/CONTRIBUTING.adoc#adding-new-functionality should help with the implementation of this issue. But also feel free to ask if you need any help.
As I mentioned, the bigger part of the task here would be to gather the data itself. After that you should be able to follow the above documentation to add a new data provider implementation; but if something is unclear there - please let me know. I'd like to improve the contributing guidelines also if they're not good enough.
Just a few suggestions also:
collation
it could be impractical to include all possible values in the .yml file. What we could do instead is use the locale
value from the faker's configuration, and using that "construct" possible collation values. E.g. for postgres we'd probably only need to append .UTF-8
to the locale string. For mysql/mariadb some "conversion logic" from locale to collation would probably be needed. The other db types IDK, would need to check what are the possible values there and how to return them in a nice way.columns
I'm not entirely sure what's a good "list of common column names" or what is even the use-case here. Feel free to submit some proposals from your end :) Also it doesn't need to be a separate property for each db type, since the values will be the same I guesstype
and engine
(where applicable), they can be added to the .yml directly. I think this would be the easiest approach for these two propertiesThx for your suggestions, I created a branch to implement the databases behavior and I have several questions for you :)
For the MongoDB provider I would like to create a generateObjectId
method based on a random date and inspired by the logic I found in JS here (https://steveridout.com/mongo-object-time/)
MongoDB Provider is not based on a yaml file, so I would like to implement the AbstractFakeDataProvider
class just like the StringProvider for example in the databases
gradle module I just created. The AbstractFakeDataProvider
class is marked as internal
, is it intentional or have you not yet had the need?
To be able to generate an objectId I would like to add a new method in the RandomService
to generate an OffsetDateTime
that can be used by anyone and by the MongoDB Provider. Can we access to the RandomService from a provider? (just removed the internal
protection in FakerService
for this field to make it work on my branch)
Hey @mboisnard ,
Let me give you some existing code examples to make things easier to understand.
Creating a new data provider that is not yaml-based outside of "core faker" is not supported. I'm not sure it makes much sense either to expose those things. Seems like a very specific use-case.
DatabaseProvider
implementation, which contains both common functionality, as well as specifics for the various <DatabaseType>Provider
s accessible via additional property (take a look at https://github.com/serpro69/kotlin-faker/blob/5106afe80cf16d43b0370e5cc3558a91d0850029/faker/edu/src/main/kotlin/io/github/serpro69/kfaker/edu/provider/Educator.kt#L23 for example)YamlFakeDataProvider
. If that is intentional, and you only want to have this one function generateObjectId
for the mongo-db provider, I can think of two ways:DatabaseProvider
instead and name it mongoDbObjectId
, for example. This way you will have DatabaseProvider
based on yaml, but you can also have functions inside it that don't use data from yaml files.MongoDbProvider
https://github.com/serpro69/kotlin-faker/blob/5106afe80cf16d43b0370e5cc3558a91d0850029/faker/edu/src/main/kotlin/io/github/serpro69/kfaker/edu/provider/Educator.kt#L49-L51 and still inherit from YamlFakeDataProvider
.Internet#iPv4Address
- https://github.com/serpro69/kotlin-faker/blob/5106afe80cf16d43b0370e5cc3558a91d0850029/core/src/main/kotlin/io/github/serpro69/kfaker/provider/Internet.kt#L48-L49 which is a custom function not based on yml-data, but is inside a YmlFakeDataProvider
implementation class)To get access to RandomService
from a data provider implementation, you can use this as an example:
randomService
property that is available from the AbstractFaker
- https://github.com/serpro69/kotlin-faker/blob/5106afe80cf16d43b0370e5cc3558a91d0850029/faker/books/src/main/kotlin/io/github/serpro69/kfaker/books/BooksFaker.kt#L39Don't know if the above made much sense :grin: Feel free to ask if you want me to clarify something further :)
Be able to generate database entries just like the faker js version:
Sources: