ifnesi / 1brc

Gunnar's 1 Billion Row Challenge (Python)
77 stars 83 forks source link

fixed createMeasurements.py #9

Closed Askill closed 3 months ago

Askill commented 4 months ago

Generating the data failed on Win11 with Python 3.12, with these changes it works.

Previous Error: Creating measurement file 'measurements.txt' with 1000000000 measurements... 0%| | 0/100 [00:00<?, ?it/s]C:\projects\one-billion-row-challenge\1brc\createMeasurements.py:463: UserWarning: Polars found a filename. Ensure you pass a path to the file instead of a python file object when possible for best performance. data.write_csv(f, separator=sep, float_precision=1, include_header=False, ) 0%| | 0/100 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\projects\one-billion-row-challenge\1brc\createMeasurements.py", line 507, in <module> measurement.generate_measurement_file( File "C:\projects\one-billion-row-challenge\1brc\createMeasurements.py", line 463, in generate_measurement_file data.write_csv(f, separator=sep, float_precision=1, include_header=False, ) File "C:\projects\one-billion-row-challenge\venv\Lib\site-packages\polars\dataframe\frame.py", line 2696, in write_csv self._df.write_csv( polars.exceptions.InvalidOperationError: file encoding is not UTF-8

ifnesi commented 4 months ago

Hi @Askill , thank you. Changing from '.txt' to '.csv' can be a breaking change, however I do appreciate the suggestion to set utf-8 encoding when reading/writing the files. Would like to resubmit your PR not only for createMeasurements.py but also all other scripts?

Askill commented 4 months ago

@ifnesi ah, yes you're right. I reverted the name change and added the encoding to all open() statements