kinu-garage / nton_matching

Apache License 2.0
0 stars 0 forks source link

GJ: Input data may come as `.xls` format #17

Open 130s opened 3 months ago

130s commented 3 months ago

As of ver 0.1.1 the input is .yaml (e.g. test_guardians.yaml). In the primary user's usecase, however, .xls spreadsheet format is used.

It'll be smooth if the tool can take in .xls. Not even .csv(Simple operation like generating .csv from .xls file even raises a bar for non tech savvy users).

130s commented 3 months ago

Reading .xls from Python seems not an issue e.g. https://www.geeksforgeeks.org/reading-excel-file-using-python/

130s commented 3 months ago

Following https://github.com/ros/rosdistro/pull/41664#pullrequestreview-2122787498, I installed openpyxl via apt:

apt update && apt install python3-openpyxl

Hm...Importing openpyxl returns an error.

# ipython3 
Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.20.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import openpyxl as xl
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-4beb9837b393> in <module>
----> 1 import openpyxl as xl

/usr/lib/python3/dist-packages/openpyxl/__init__.py in <module>
      2 
      3 
----> 4 from openpyxl.compat.numbers import NUMPY, PANDAS
      5 from openpyxl.xml import DEFUSEDXML, LXML
      6 from openpyxl.workbook import Workbook

/usr/lib/python3/dist-packages/openpyxl/compat/__init__.py in <module>
      1 # Copyright (c) 2010-2019 openpyxl
      2 
----> 3 from .numbers import NUMERIC_TYPES
      4 from .strings import safe_string
      5 

/usr/lib/python3/dist-packages/openpyxl/compat/numbers.py in <module>
     39                                      numpy.float32,
     40                                      numpy.float64,
---> 41                                      numpy.float,
     42                                      numpy.bool_,
     43                                      numpy.floating,

~/.local/lib/python3.9/site-packages/numpy/__init__.py in __getattr__(attr)
    392 
    393         if attr in __former_attrs__:
--> 394             raise AttributeError(__former_attrs__[attr])
    395 
    396         if attr in __expired_attributes__:

AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
130s commented 3 months ago

https://stackoverflow.com/questions/74844262/ reads to me numpy 1.24 or newer would solve this issue but mine is newer.

In [1]: from importlib.metadata import version
   ...: version('numpy')
Out[1]: '2.0.1'

In [2]: version("openpyxl")
Out[2]: '3.0.3'

So https://github.com/theorchard/openpyxl/issues/19 reads that openpyxl needs to be newer too. On github.com/theorchard/openpyxl I don't see updated release info but https://openpyxl.readthedocs.io/en/latest/changes.html shows 3.0.6 includes the bugfix.

https://pkgs.org/search/?q=openpyxl shows Ubuntu 24.04 comes with 3.1.2 and Ubuntu 22.04 with 3.0.9. Maybe I need to use newer Python Docker image (not sure why I've been testing w/3.9).

130s commented 3 months ago

Looks like Ubuntu 24.04 came with Python 3.12.3 (utoronto.ca) so why not use that.

With python:3.12.5-slim-bullseye the error does not occur.

root@130s-p16s:/cws/src/130s/nton_matching# ipython3 
Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.20.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import openpyxl as xl
In [2]: 
130s commented 3 months ago

Btw, I took over https://github.com/ros/rosdistro/pull/41664 and made a new PR to add a rosdep key.

130s commented 3 months ago

Dev blocked by https://github.com/kinu-garage/nton_matching/issues/20

130s commented 3 months ago

Dev blocked by #20

Unblocked.

Now after resolving pytest installation https://github.com/kinu-garage/nton_matching/issues/20#issuecomment-2305149974, test fails due to openpyxl not being found. This is likely because the decision for dependency installation at this stage is pip only https://github.com/kinu-garage/nton_matching/issues/19 but I think I somehow forgot it and had openpyxl installed via apt (one proof of it is me opening up https://github.com/ros/rosdistro/pull/42483 recently).

For now I'll have to add openpyxl via pip.

130s commented 3 months ago
One of the initial 2 test cases fail ``` # pytest n_to_n_matching/test/test_spreadsheet_access.py -v =========================================================================================== test session starts ============================================================================================ platform linux -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0 -- /usr/local/bin/python cachedir: .pytest_cache rootdir: /cws/src/130s/nton_matching configfile: pytest.ini collected 2 items n_to_n_matching/test/test_spreadsheet_access.py::test_get_master_sheet PASSED [ 50%] n_to_n_matching/test/test_spreadsheet_access.py::test_get_candidates_tosho ---------------------------------------------------------------------------------------------- live log call ----------------------------------------------------------------------------------------------- INFO n_to_n_matching.spreadsheet_access:spreadsheet_access.py:60 row_ids: [19, 32, 40, 60, 67, 75, 76, 78, 96, 100, 110, 122, 129, 139, 159, 162, 172, 188, 198, 205, 213, 214, 252, 263] INFO n_to_n_matching.spreadsheet_access:spreadsheet_access.py:68 Rows matched: [] FAILED [100%] ================================================================================================= FAILURES ================================================================================================= ________________________________________________________________________________________ test_get_candidates_tosho _________________________________________________________________________________________ xls_file_obj = , touban_accessor = def test_get_candidates_tosho(xls_file_obj, touban_accessor): master_sheet = touban_accessor.get_master_sheet(xls_file_obj) candidate_rows = touban_accessor.get_candidates(master_sheet, touban_accessor.NAME_TOSHOIIN) > assert candidate_rows, f"Array of candidate not meeting criteria: '{candidate_rows}'" E AssertionError: Array of candidate not meeting criteria: '[]' E assert [] n_to_n_matching/test/test_spreadsheet_access.py:45: AssertionError -------------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------------- INFO n_to_n_matching.spreadsheet_access:spreadsheet_access.py:60 row_ids: [19, 32, 40, 60, 67, 75, 76, 78, 96, 100, 110, 122, 129, 139, 159, 162, 172, 188, 198, 205, 213, 214, 252, 263] INFO n_to_n_matching.spreadsheet_access:spreadsheet_access.py:68 Rows matched: [] ========================================================================================= short test summary info ========================================================================================== FAILED n_to_n_matching/test/test_spreadsheet_access.py::test_get_candidates_tosho - AssertionError: Array of candidate not meeting criteria: '[]' ```

Then I noticed on ipython3 even a import fails (On pytest this import error shouldn't be happening as I see the .xls content is read).

Log ``` # ipython3 Python 3.9.2 (default, Feb 28 2021, 17:03:44) Type 'copyright', 'credits' or 'license' for more information IPython 7.20.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import openpyxl as pyxl ...: --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) in ----> 1 import openpyxl as pyxl ModuleNotFoundError: No module named 'openpyxl' In [2]: ```

Noticed Py version is different.


UDPATE: Switched ipython to pip version then it worked.

130s commented 3 months ago

Switched ipython to pip version then it worked.