ukcp-data / ukcp-data-processor

Python library for reading, writing, processing and plotting UKCP data.
6 stars 1 forks source link

Are "utf-8" encodings still needed in python 3? #14

Closed agstephens closed 4 years ago

agstephens commented 4 years ago

I have spotted a number of .encode("utf-8") usages in the CSV writers, e.g.:

 grep encode ../ukcp-data-processor/ukcp_dp/file_writers/*
grep: ../ukcp-data-processor/ukcp_dp/file_writers/__pycache__: Is a directory
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_cdf.py:            var = self.input_data.get_value_label(InputType.VARIABLE)[0].encode("utf-8")
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_default.py:    output_data_file.write(title.encode("utf-8").replace("\n", " "))
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_jp.py:        x = self.input_data.get_value_label(InputType.VARIABLE)[0].encode("utf-8")
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_jp.py:        y = self.input_data.get_value_label(InputType.VARIABLE)[1].encode("utf-8")
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_pdf.py:            var = self.input_data.get_value_label(InputType.VARIABLE)[0] #.encode("utf-8")
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_plume.py:                var = self.input_data.get_value_label(InputType.VARIABLE)[0].encode(
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_plume.py:            var = self.input_data.get_value_label(InputType.VARIABLE)[0].encode("utf-8")
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_postage_stamp_map.py:            var = self.input_data.get_value_label(InputType.VARIABLE)[0].encode("utf-8")
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_sample.py:            self.input_data.get_value_label(InputType.VARIABLE)[i].encode("utf-8")
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_sample.py:            var = self.input_data.get_value_label(InputType.VARIABLE)[i].encode("utf-8")
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_subset.py:            var = self.input_data.get_value_label(InputType.VARIABLE)[0].encode("utf-8")
../ukcp-data-processor/ukcp_dp/file_writers/_write_csv_three_map.py:            var = self.input_data.get_value_label(InputType.VARIABLE)[0].encode("utf-8")

However, I am seeing header lines in the output file like this:

b'Minimum air temperature anomaly at 1.5m (\xc2\xb0C)',...

But if I remove the .encode("utf-8") I get a sensible header line that isn't an encoded bytestring, e.g.:

Minimum air temperature anomaly at 1.5m (°C),...

This example was generated using this script with a test request:

/usr/local/miniconda/envs/ukcp18/bin/wps_runproc processes.supported.ls1_cdf_pdf_01.ls1_cdf_pdf_01#LS1_CDF_PDF_01 /usr/local/ukcp/ukcp18-wps/proc_outputs/2020-03-17/LS1_CDF_PDF_01/591c1786239cecbf09dfe6ed55cae7e1

Thanks