reubano / meza

A Python toolkit for processing tabular data
MIT License
416 stars 32 forks source link

ValueError converting zero-value currencies #36

Closed SteadBytes closed 4 years ago

SteadBytes commented 4 years ago

Type detection raises a ValueError for a currencies with a value of zero e.g. '$0', '0$'.

>>> import itertools as it
>>> from meza import process as pr
>>> 
>>> records = it.repeat({"money": "$0"})
>>> records, result = pr.detect_types(records)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/meza/meza/process.py", line 333, in detect_types
    for t in tt.guess_type_by_value(record):
  File "/meza/meza/typetools.py", line 172, in guess_type_by_value
    result = type_test(g['func'], g['type'], key, value)
  File "/meza/meza/typetools.py", line 33, in type_test
    passed = test(value)
  File "//meza/meza/fntools.py", line 509, in is_int
    passed = is_numeric(content, thousand_sep, decimal_sep)
  File "/meza/meza/fntools.py", line 489, in is_numeric
    passed = int(content) == 0
ValueError: invalid literal for int() with base 10: '$0'

This is caused by is_numeric casting the original, unstripped content to an int: https://github.com/reubano/meza/blob/110f855fa95bcd9665018358059d9df25de1dedf/meza/fntools.py#L489

As far as I can tell, this should only be an issue when the value starts with 0 and the only non-numeric characters are currency symbols. Here is a failing test case for this:

diff --git a/tests/test_fntools.py b/tests/test_fntools.py
index 922bc17..f8cdc75 100644
--- a/tests/test_fntools.py
+++ b/tests/test_fntools.py
@@ -45,6 +45,11 @@ class TestIterStringIO:
         nt.assert_false(ft.is_numeric(None))
         nt.assert_false(ft.is_numeric(''))

+    def test_is_numeric_0_currency(self):
+        for sym in ft.CURRENCIES:
+            nt.assert_true(ft.is_numeric(f'0{sym}'))
+            nt.assert_true(ft.is_numeric(f'{sym}0'))
+
     def test_is_int(self):
         nt.assert_false(ft.is_int('5/4/82'))
reubano commented 4 years ago

Nice catch!