A major refactor for speed improvement mostly on the transaction generation.
Measured speed from ~160s for 10000 customers with adults_2550_female_rural profile to 13s.
Also, refactor to use argparse for user input and validation.
things that made a difference for speed:
building a map of the merchants per category to avoid using pandas to filter rows at each iteration
loading merchants ahead, instead of loading in the loop
closest_rand look up using bisect_left algo
cleaning / reducing redundant code.
avoid instantiating classes in loop, instead instantiate the class once and call class methods to do the work.
Tests modified to fit the new format, but insure no regression was introduced.
I also tested final output setting random generator seeds and it was the same for both original and this version.
Refactor also avoids overwriting profile object keys which make the code very hard to test, and instead create a separate object property to store results of the profile weight computations.
A major refactor for speed improvement mostly on the transaction generation. Measured speed from ~160s for 10000 customers with adults_2550_female_rural profile to 13s. Also, refactor to use
argparse
for user input and validation.things that made a difference for speed:
Tests modified to fit the new format, but insure no regression was introduced. I also tested final output setting random generator seeds and it was the same for both original and this version.
Refactor also avoids overwriting profile object keys which make the code very hard to test, and instead create a separate object property to store results of the profile weight computations.