otsaloma / dataiter

Python classes for data manipulation
https://dataiter.readthedocs.io/
MIT License
25 stars 0 forks source link

Check handling of object columns #10

Closed otsaloma closed 2 years ago

otsaloma commented 2 years ago

Testing if we could use dtype object instead of string, which might be necessary at times to reduce memory use. At least np.unique can't handle that.

diff --git a/dataiter/test/__init__.py b/dataiter/test/__init__.py
index 7cf496f..3f38dda 100644
--- a/dataiter/test/__init__.py
+++ b/dataiter/test/__init__.py
@@ -30,7 +30,11 @@ def data_frame(path):
     path = get_data_path(path)
     extension = path.suffix.lstrip(".")
     read = getattr(DataFrame, f"read_{extension}")
-    return read(path)
+    data = read(path)
+    for colname, column in data.items():
+        if column.is_string():
+            data[colname] = column.as_object()
+    return data

 def geojson(path):
     path = get_data_path(path)
py.test --tb=no dataiter/test/test_data_frame.py
================================================= test session starts =================================================
platform linux -- Python 3.9.9, pytest-6.2.5, py-1.10.0, pluggy-0.13.1
rootdir: /home/osmo/Source/dataiter
collected 82 items                                                                                                    

dataiter/test/test_data_frame.py ................FF...........FFF.FFF.....................F.....F.....FF...F.F. [ 95%]
....                                                                                                            [100%]

=============================================== short test summary info ===============================================
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_aggregate - TypeError: The axis argument to unique is n...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_anti_join - TypeError: The axis argument to unique is n...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_from_json - assert \n   category ...905 rows total == \...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_from_pandas - assert \n      id    ...442 rows total ==...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_full_join - TypeError: The axis argument to unique is n...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_inner_join - TypeError: The axis argument to unique is ...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_left_join - TypeError: The axis argument to unique is n...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_left_join_by_tuple - TypeError: The axis argument to un...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_semi_join - TypeError: The axis argument to unique is n...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_to_json - assert \n   category ...905 rows total == \n ...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_unique_by_one - TypeError: The axis argument to unique ...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_unique_by_same_dtype - TypeError: The axis argument to ...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_write_csv - assert \n      id    ...442 rows total == \...
FAILED dataiter/test/test_data_frame.py::TestDataFrame::test_write_json - assert \n   category ...905 rows total == ...
=========================================== 14 failed, 68 passed in 22.00s ============================================