Closed mattruggio closed 9 years ago
I have to study your change a bit! I've not messed with force encodings on a string by string basis. My instinct is that since this class only pertains to US addresses that the change you made is ok, but I'd like to read up a bit more.
Thats a good point. For our use case, we only needed US address parsing. Since the class is geared toward US, it seems to fit perfectly fine in the ASCII table (http://www.ascii.cl/htmlcodes.htm). But, it does limit the use cases for the US class. I could see it being passed in as an option (something like :ascii or :force_ascii, and if so, it will force the encoding. A quick comment header on the function could explain the option and let others know that if their addresses fit in this encoding, it could boost performance using the option. Thoughts?
Closing on two accounts.
Thank you for this class, it was extremely helpful in understanding address parsing and normalization within Ruby. For our case, we needed to parse datasets that have 6,000,000+ addresses. In order to use your class, we found some very slight modifications could be made that would make it extremely fast.
Here are some benchmarks using the addresses and intersections supplied in the tests:
Before Optimization
(Time in milliseconds) - Address (1.0882) - 2730 S Veitch St Apt 207, Arlington, VA 22206 (1.1823) - 44 Canal Center Plaza Suite 500, Alexandria, VA 22314 (1.1187) - 1600 Pennsylvania Ave Washington DC (1.2637) - 1005 Gravenstein Hwy N, Sebastopol CA 95472 (1.2633) - PO BOX 450, Chicago IL 60657 (1.179) - 2730 S Veitch St #207, Arlington, VA 22206 (2.1652) - Hollywood & Vine, Los Angeles, CA (2.3919) - Hollywood Blvd and Vine St, Los Angeles, CA (2.3417) - Mission Street at Valencia Street, San Francisco, CA
After Optimization
(Time in milliseconds) - Address (0.0129) - 2730 S Veitch St Apt 207, Arlington, VA 22206 (0.0073) - 44 Canal Center Plaza Suite 500, Alexandria, VA 22314 (0.0063) - 1600 Pennsylvania Ave Washington DC (0.0053) - 1005 Gravenstein Hwy N, Sebastopol CA 95472 (0.0006) - PO BOX 450, Chicago IL 60657 (0.0086) - 2730 S Veitch St #207, Arlington, VA 22206 (0.0143) - Hollywood & Vine, Los Angeles, CA (0.009) - Hollywood Blvd and Vine St, Los Angeles, CA (0.0087) - Mission Street at Valencia Street, San Francisco, CA
I have supplied 3 commits for you:
All tests were ran successfully during and after refactoring. Let me know your thoughts!