Closed rgoubet closed 1 year ago
Hi! Actually, there is an accumulator
argument for such cases: https://mimesis.name/en/master/api.html#mimesis.Numeric.increment
Here is a usage example:
>>> numeric.increment()
1
>>> numeric.increment(accumulator="a")
1
>>> numeric.increment()
2
numeric.increment(accumulator="a")
2
>>> numeric.increment(accumulator="b")
1
>>> numeric.increment(accumulator="a")
3
In your case, you are using schemas wrong way.
Instead of doing this:
for i in range(0,5):
data = schema.create(5)
print(data[0]['id'])
Do this:
for i in schema.create(5):
print(i['id'])
In your case, you are using schemas wrong way.
In my code example, I'm trying to create 5 fullfilled schemas (that I could then export 5 times) based on the same logical schema. And here, I cannot use a new accumulator every time, unless I instantiate a new Schema
object every time.
@rgoubet Sorry, I don't get the idea. Can you, please, illustrate it on example?
My use case is that I want to create multiple, large random data sets in Excel files (generated with openpxl) for stress test purposes. So, let's say I want to create 5 files with 1 million rows each (I use 4 columns for readability, while in practice I get 30):
from mimesis import Field, Schema
from openpyxl import Workbook
_ = Field()
schema = Schema(schema=lambda: {
"id": _('increment'),
"timestamp": _('datetime'),
'version': _('version'),
'e-mail': _('person.email', domains=['argenx.com']),
'token': _('token_hex'),
}
Now, I'll run a loop for each file, and use the iterator
to preserve memory:
for i in range(0,5):
wb = Workbook(write_only=True)
ws = wb.create_sheet()
for ix, v in enumerate(schema.iterator(1_000_000)):
if ix==0:
ws.append(list(v.keys())) # write headers
else:
ws.append(list(v.values())) # write data
xl_file = os.path.join(path, f'data{str(i).zfill(3)}.xlsx')
wb.save(xl_file)
wb.close()
Now, it's all good, except that the id
column increment continues in each file instead of restarting from 1. In my case, that could have been an issue as it can then become a larger number than I would want for the data type I want (turned out ok in the end).
As I said, maybe I missed something, but it would be nice to have a reset option (e.g. in the create
and iterator
methods) for the increments. Not critical at all, though.
This issue has been automatically marked as stale because it has not had activity. It will be closed if no further activity occurs. Thank you for your contributions.
Feature request
Unless I missed it, there doesn't seem to be a way to reset increments: if you generate data several times with the same schema, increments will pick up from the previous creation:
This returns:
Thesis
There should be an option to reset the increment each time data is generated.
Reasoning
When creating large amounts of data to export several times, you don't necessarily want increments to become huge.