Closed lk-geimfari closed 6 years ago
What do you think? It's can be useful?
I have already implemented basic functionality. It remains a little to finish.
Looks good! But how would you pass params to 'personal.password'
for example?
One more pretty adorable way to generate data I think.
@sobolevn That is a real trouble. I would suggest personal.surname.female
, where .female
is first argument, but it's looks not so good if method has many arguments. We'll wait proposals.
I think that arguments are really important. They should be both:
Consider this situation: I want to have users with different ages https://github.com/lk-geimfari/mimesis/blob/master/mimesis/providers/personal.py#L29
If user's age is under 18, they are not allowed. Otherwise - allowed. I don't what to create two schemas for that. Since users contain a lot of fields. And I don't want to copy paste it.
Solution: I can create a factory function.
def generate_schema_for_age(age):
schema ={
'username': 'personal.username',
'password': 'personal.password',
'full_name': 'datetime.full_name',
'age': age
}
return schema.create(schema=schema)
But is it the way we want to go?
I'm absolutely agree. But i cannot evaluate the complexity of implementation immediately.
@Valerievich It's done. We need to implement support of arguments.
I don't like current implementation. It breaks one major rule: "everything is an object". Our fields right now are not objects in general case. They are strings.
So it could break a lot of things for the end user. Imagine a user has some sort of logic to reformat phone numbers to his specific needs. Like: reformat_phone(value)
. How is it possible with the current implementation? Or any other functions/classes/etc which wraps values.
What do I suggest?
In my opinion, we should create LazyField
wrapper to wrap any other existing field. And a special fields
container with all the existing fields wrapped into LazyField
. So, how would it work?
>>> from custom.utils import format_phone
>>> from mimesis.schema import Schema, fields
>>> schema = Schema('en')
>>> schema.load(schema={
... "id": fields.cryptographic.uuid(version=4),
... "name": fields.personal.full_name(gender='female'),
... "version": fields.development.version(semantic=True),
... "phone": format_phone(fields.personal.phone),
... }).create(iterations=2)
On each iteration lazy objects (or generator) generates new value. User has all the control, code is more pythonic.
Do you have any ideas? Am I missing something?
@sobolevn Of course it's looks much better, than current implementation. I have only one question: How we can generate data by schema.json? Or, maybe it's doesn't matter? Anyway, i have really like idea with fields
. I'm all for it.
@sobolevn Can you explain, please, how to implement LazyField based data generator? I mean steps. Maybe you have link to similar theme? I want to try implement it and close this issue on this week. Thanks you.
I came with even better idea: why now implementing custom provider for factory_boy
?
It already has all the stuff we need!
@sobolevn I have never worked with this library, but i'll try to figure out how to do it.
Unfortunately, I did not understand how to add the ability to pass arguments. @sobolevn Can you look at this issue when you'll have free time, please?
Sure!
The randomness distribution in schema generations seems off. Even from many iterations the resultant data usually only contains a handful of unique values per field.
from mimesis.schema import Schema
import pandas as pd
schema = Schema('en')
data = schema.load(schema={
"Name": "personal.name",
"Surname": "personal.surname",
"Username": "personal.username"
}).create(iterations=10000)
# Value counts and markdown tables
df = pd.DataFrame(data)
for name, series in df.iteritems():
theader = f'{name}|Count'
trow = ''
for value, count in series.value_counts().iteritems():
trow += f'{value}|{count}\n'
print(f'{theader}\n-|-\n{trow}')
10000 Iterations
Name | Count |
---|---|
Wynell | 6456 |
Nakita | 1436 |
Spring | 1218 |
Yuko | 890 |
Surname | Count |
---|---|
Shannon | 6435 |
Harrell | 1447 |
Tyler | 1212 |
Vincent | 906 |
Username | Count |
---|---|
Vaultier_1956 | 6445 |
Compatriot_2055 | 1452 |
Dervish.1941 | 1205 |
edsel.1825 | 898 |
@samuarl It's strange, because we have disabled seed by default.
@samuarl I have run this script on my laptop and everything is okay.
Implemented in 53c8741930fe0c79e605d53817e4ef4bcead0766.
We have implemented very primitive generator by schema:
But is not better solutions. @sobolevn suggest much better solutions using Lazy objects`, which looks like that:
I like second solution too.