optimeas / python-osf

Python Implementation for reading OSF files and streams
MIT License
1 stars 2 forks source link

libosf.core cannot deal with gpslocation #2

Open Mq89 opened 6 months ago

Mq89 commented 6 months ago

It seems that the libosf.core cannot deal with channels of datatype gpslocation. Running the "to_csv" example with python3 ./to_csv.py -i example.osf -c GPS.Location produces the following stack trace:

Traceback (most recent call last):
  File "[REDACTED]/python-osf/examples/./to_csv.py", line 58, in <module>
    main(sys.argv[1:])
  File "[REDACTED]/python-osf/examples/./to_csv.py", line 35, in main
    df = samples.make_column_based()
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[REDACTED]/python-osf/src/libosf/core.py", line 149, in make_column_based
    df = DataFrame(data=frame_data)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[REDACTED]/python-osf/venv/lib/python3.11/site-packages/pandas/core/frame.py", line 767, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[REDACTED]/python-osf/venv/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[REDACTED]/python-osf/venv/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
    index = _extract_index(arrays)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "[REDACTED]/python-osf/venv/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length

I already narrowed it down to get_samples already producing a tuple of three arrays with different lengths (i.e., 362, 266, 362) while they should have the same length. The loop starting in L209 always extends result_timestamps by 1 element. While result_values and result_indexes are sometimes extended by more than 1 element, hence the lengths diverge.

https://github.com/optimeas/python-osf/blob/4a4d7edee679b8fcde13de8e1c9847a553b117f7/src/libosf/core.py#L185-L218

Sth79 commented 5 months ago

Yes, the "to_csv.py" script does not handle the gpslocation datatype correctly and it will crash. We will fix the script in the next 1-2 weeks - sorry we are really busy at the moment. In the "print_channel_data.py" script you see how the datatype is handled correctly. Also we will check the issue with the different signal lengths. image