Closed otsaloma closed 2 years ago
Testing if we could use dtype object instead of string, which might be necessary at times to reduce memory use. At least np.unique can't handle that.
np.unique
diff --git a/dataiter/test/__init__.py b/dataiter/test/__init__.py index 7cf496f..3f38dda 100644 --- a/dataiter/test/__init__.py +++ b/dataiter/test/__init__.py @@ -30,7 +30,11 @@ def data_frame(path): path = get_data_path(path) extension = path.suffix.lstrip(".") read = getattr(DataFrame, f"read_{extension}") - return read(path) + data = read(path) + for colname, column in data.items(): + if column.is_string(): + data[colname] = column.as_object() + return data def geojson(path): path = get_data_path(path)
py.test --tb=no dataiter/test/test_data_frame.py ================================================= test session starts ================================================= platform linux -- Python 3.9.9, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 rootdir: /home/osmo/Source/dataiter collected 82 items dataiter/test/test_data_frame.py ................FF...........FFF.FFF.....................F.....F.....FF...F.F. [ 95%] .... [100%] =============================================== short test summary info =============================================== FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_aggregate - TypeError: The axis argument to unique is n... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_anti_join - TypeError: The axis argument to unique is n... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_from_json - assert \n category ...905 rows total == \... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_from_pandas - assert \n id ...442 rows total ==... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_full_join - TypeError: The axis argument to unique is n... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_inner_join - TypeError: The axis argument to unique is ... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_left_join - TypeError: The axis argument to unique is n... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_left_join_by_tuple - TypeError: The axis argument to un... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_semi_join - TypeError: The axis argument to unique is n... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_to_json - assert \n category ...905 rows total == \n ... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_unique_by_one - TypeError: The axis argument to unique ... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_unique_by_same_dtype - TypeError: The axis argument to ... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_write_csv - assert \n id ...442 rows total == \... FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_write_json - assert \n category ...905 rows total == ... =========================================== 14 failed, 68 passed in 22.00s ============================================
Testing if we could use dtype object instead of string, which might be necessary at times to reduce memory use. At least
np.unique
can't handle that.