OSError is raised when executing the test code tests/data/test_mm.py. All test cases failed for the same issue.
$ nosetests ./data/test_mm.py -v
test0_get_default_option (data.test_mm.TestMatrixMarket) ... ok
test1_is_valid_option (data.test_mm.TestMatrixMarket) ... ok
test2_create (data.test_mm.TestMatrixMarket) ... [INFO ] 2023-12-19 04:03:30 [mm.py:247] Create the database from matrix market file.
[DEBUG ] 2023-12-19 04:03:30 [mm.py:252] Building meta part...
^M[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s[INFO ] 2023-12-19 04:03:30 [base.py:179] File ./mm.h5py exists. To build new database, existing file ./mm.h5py will be deleted.
[ERROR ] 2023-12-19 04:03:30 [mm.py:162] Cannot create db: Can't write data (no appropriate function for conversion path)
[ERROR ] 2023-12-19 04:03:30 [mm.py:163] Traceback (most recent call last):
File "/home/bc-user/.local/lib/python3.10/site-packages/buffalo/data/mm.py", line 141, in _create
idmap["rows"][:] = np.loadtxt(fin, dtype=f"S{uid_max_col}")
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/bc-user/.local/lib/python3.10/site-packages/h5py/_hl/dataset.py", line 999, in __setitem__
self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 283, in h5py.h5d.DatasetID.write
File "h5py/_proxy.pyx", line 114, in h5py._proxy.dset_rw
OSError: Can't write data (no appropriate function for conversion path)
......(skip the middle lines)
MatrixMarketDataReader: DEBUG: creating temporary matrix-market data from numpy-kind array
MatrixMarket: INFO: Create the database from matrix market file.
MatrixMarket: DEBUG: Building meta part...
[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s
MatrixMarket: INFO: File ./mm.h5py exists. To build new database, existing file ./mm.h5py will be deleted.
MatrixMarket: ERROR: Cannot create db: Can't write data (no appropriate function for conversion path)
MatrixMarket: ERROR: Traceback (most recent call last):
File "/home/bc-user/.local/lib/python3.10/site-packages/buffalo/data/mm.py", line 141, in _create
idmap["rows"][:] = np.loadtxt(fin, dtype=f"S{uid_max_col}")
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/bc-user/.local/lib/python3.10/site-packages/h5py/_hl/dataset.py", line 999, in __setitem__
self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5d.pyx", line 283, in h5py.h5d.DatasetID.write
File "h5py/_proxy.pyx", line 114, in h5py._proxy.dset_rw
OSError: Can't write data (no appropriate function for conversion path)
[PROGRESS] 100.00% 0.0/0.0secs 1,137.96it/s
--------------------- >> end captured logging << ---------------------
----------------------------------------------------------------------
Ran 10 tests in 0.041s
FAILED (errors=5)
The cause is from mismatching between the data type of HDF5 and the numpy object, as annotated in the above error log. The current version only supports "utf-8" encoding for creating idmap, which makes the MatrixMarket object fail to load both user and item ID lists. To resolve the issue, converting the encoding rule from "utf-8" to "ascii" might be the feasible way. I tested a code with the local patch(buffalo/data/base.py) as follows,
test0_get_default_option (data.test_mm.TestMatrixMarket) ... ok
test1_is_valid_option (data.test_mm.TestMatrixMarket) ... ok
test2_create (data.test_mm.TestMatrixMarket) ...
[INFO ] 2023-12-19 04:54:58 [mm.py:247] Create the database from matrix market file.
[DEBUG ] 2023-12-19 04:54:58 [mm.py:252] Building meta part...
[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s[INFO ] 2023-12-19 04:54:58 [base.py:179] File ./mm.h5py exists. To build new database, existing file ./mm.h5py will be deleted.
[PROGRESS] 100.00% 0.0/0.0secs 742.35it/s
[INFO ] 2023-12-19 04:54:58 [mm.py:260] Creating working data...
[PROGRESS] 0.00% 0.0/0.0secs 0.00it/s^M[PROGRESS] 100.00% 0.0/0.0secs 168,937.24it/s
[DEBUG ] 2023-12-19 04:54:58 [mm.py:264] Working data is created on /tmp/tmpr5a6iwrk
[INFO ] 2023-12-19 04:54:58 [mm.py:265] Building data part...
[INFO ] 2023-12-19 04:54:58 [base.py:417] Building compressed triplets for rowwise...
[INFO ] 2023-12-19 04:54:58 [base.py:418] Preprocessing...
[INFO ] 2023-12-19 04:54:58 [base.py:421] In-memory Compressing ...
[INFO ] 2023-12-19 04:54:59 [base.py:301] Load triplet files. Total job files: 73
[INFO ] 2023-12-19 04:54:59 [base.py:451] Finished
[INFO ] 2023-12-19 04:54:59 [base.py:417] Building compressed triplets for colwise...
[INFO ] 2023-12-19 04:54:59 [base.py:418] Preprocessing...
[INFO ] 2023-12-19 04:54:59 [base.py:421] In-memory Compressing ...
[INFO ] 2023-12-19 04:54:59 [base.py:301] Load triplet files. Total job files: 73
[INFO ] 2023-12-19 04:54:59 [base.py:451] Finished
[INFO ] 2023-12-19 04:54:59 [mm.py:279] DB built on ./mm.h5py
ok
......(skip the middle lines)
test3_list (data.test_mm.TestMatrixMarketReader) ... [DEBUG ] 2023-12-19 04:55:01 [mm.py:70] creating temporary matrix-market data from numpy-kind array
ok
----------------------------------------------------------------------
Ran 10 tests in 3.166s
OK
However, this patch is not functional for treating w2v training(PR) in which "utf-8" characters are employed to train Korean words. To reconcile this conflict, providing the appropriate encoding rules for both loading a matrix-market file and a stream data file is one of the feasible actions.
Bug
OSError
is raised when executing the test codetests/data/test_mm.py
. All test cases failed for the same issue.The cause is from mismatching between the data type of
HDF5
and thenumpy
object, as annotated in the above error log. The current version only supports "utf-8" encoding for creatingidmap
, which makes theMatrixMarket
object fail to load both user and item ID lists. To resolve the issue, converting the encoding rule from "utf-8" to "ascii" might be the feasible way. I tested a code with the local patch(buffalo/data/base.py
) as follows,However, this patch is not functional for treating w2v training(PR) in which "utf-8" characters are employed to train Korean words. To reconcile this conflict, providing the appropriate encoding rules for both loading a matrix-market file and a stream data file is one of the feasible actions.