teamtomo / starfile

STAR file I/O in Python
https://teamtomo.org/starfile/
BSD 3-Clause "New" or "Revised" License
44 stars 19 forks source link

starfile.read fails when parsing empy loop blocks #70

Open andschenk opened 2 days ago

andschenk commented 2 days ago

Description

starfile.read fails when trying to parse a loop block without data that was previously written with starfile.write. In that case the block header is followed by multiple empty lines, which trip up the parser.

Minimal example to reproduce it

>>> import pandas as pd
>>> import starfile
>>> starfile.write({'block':pd.DataFrame({'col1':[]})},'test.star')
>>> sf=starfile.read('test.star')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/andreas/sft/starfile/src/starfile/functions.py", line 43, in read
    parser = StarParser(filename, n_blocks_to_read=read_n_blocks, parse_as_string=parse_as_string)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/andreas/sft/starfile/src/starfile/parser.py", line 48, in __init__
    self.parse_file()
  File "/home/andreas/sft/starfile/src/starfile/parser.py", line 60, in parse_file
    block_name, block = self._parse_data_block()
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/andreas/sft/starfile/src/starfile/parser.py", line 74, in _parse_data_block
    return block_name, self._parse_loop_block()
                       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/andreas/sft/starfile/src/starfile/parser.py", line 124, in _parse_loop_block
    df = pd.read_csv(
         ^^^^^^^^^^^^
  File "/usr/lib64/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/site-packages/pandas/io/parsers/readers.py", line 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/site-packages/pandas/io/parsers/readers.py", line 1898, in _make_engine
    return mapping[engine](f, **self.options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in __init__
    self._reader = parsers.TextReader(src, **kwds)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "parsers.pyx", line 581, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

Possible fix

As small change in the parser can fix the issue. Instead of checking whether loop_data is equal to line feed I just checked whether loop_data starts with a line feed. In addition, I added a line to set the column headers even for empty loop block-

diff --git a/src/starfile/parser.py b/src/starfile/parser.py
index 0febed0..6b905c1 100644
--- a/src/starfile/parser.py
+++ b/src/starfile/parser.py
@@ -116,9 +116,10 @@ class StarParser:
             loop_data += '\n'

         # put string data into a dataframe
-        if loop_data == '\n':
+        if loop_data.startswith('\n'):
             n_cols = len(loop_column_names)
             df = pd.DataFrame(np.zeros(shape=(0, n_cols)))
+            df.columns = loop_column_names
         else:
             column_name_to_index = {col: idx for idx, col in enumerate(loop_column_names)}
             df = pd.read_csv(

With these changes the parser reads the star file fine.

>>> import pandas as pd
>>> import starfile
>>> starfile.write({'block':pd.DataFrame({'col1':[]})},'test.star')
>>> sf=starfile.read('test.star')
>>> sf
Empty DataFrame
Columns: [col1]
Index: []
jojoelfe commented 2 days ago

Hi @andschenk ,

thanks so much for reporting this! Since you already have a solution, do you want to open a PR?

Johannes

andschenk commented 7 hours ago

Hi Johannes,

I created a pull request with the changes.

Best, Andreas