rakshitha123 / TSForecasting

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.
https://forecastingdata.org/
Other
206 stars 44 forks source link

UnicodeDecodeError in data_loader.py when reading m4_monthly_dataset.tsf on Windows #2

Closed JBOE22175 closed 3 years ago

JBOE22175 commented 3 years ago

I downloaded and extracted M4 monthly dataset and startet data_loader.py. I get following error message:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-3-78dfe61da181> in <module>
      1 filename = 'm4_monthly_dataset.tsf'
      2 loaded_data, frequency, forecast_horizon, contain_missing_values, contain_equal_length = \
----> 3     data_loader.convert_tsf_to_dataframe("tsf_data/"+filename)
      4 
      5 print('loaded_data',loaded_data)

~\Documents\PythonScripts\Timeseries2020\MonashTSForecastingArchiv\data_loader.py in convert_tsf_to_dataframe(full_file_path_and_name, replace_missing_vals_with, value_column_name)
     28 
     29     with open(full_file_path_and_name, 'r', encoding='utf-8') as file:
---> 30         for line in file:
     31             # Strip white space from start/end of line
     32             line = line.strip()

~\miniconda3\envs\sktime\lib\codecs.py in decode(self, input, final)
    320         # decode input (taking the buffer into account)
    321         data = self.buffer + input
--> 322         (result, consumed) = self._buffer_decode(data, self.errors, final)
    323         # keep undecoded input until the next call
    324         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 437: invalid start byte

If I change encoding from utf-8 to ansi it works:

 #with open(full_file_path_and_name, 'r', encoding='utf-8') as file:
 with open(full_file_path_and_name, 'r', encoding='ansi') as file:

I work on Windows 10 with Python 3.9.4

rakshitha123 commented 3 years ago

Hi,

I changed the encoding type to "cp1252". It should work now.

Thanks