obspy / obspy

ObsPy: A Python Toolbox for seismology/seismological observatories.
https://www.obspy.org
Other
1.16k stars 532 forks source link

Check decoding of files opened in text mode #1485

Open megies opened 8 years ago

megies commented 8 years ago

The following occurrences in the code base open files for reading in text mode without specifying an encoding. That means the data get decoded as ASCII which will fail if the input data can contain unicode characters (e.g. like in #1483).

It probably would be a good idea to go through these cases and either explicitly specify the encoding or open as binary and explicitly decode.

geodetics/flinnengdahl.py
88-            self.lons[quad] = lons
89-            self.fenums[quad] = fenums
90-
91:        with open(self.numbers_file, 'rt') as csvfile:
92-            fe_csv = csv.reader(csvfile, delimiter=native_str(';'),
93-                                quotechar=native_str('#'),
94-                                skipinitialspace=True)

io/sh/core.py
133-    .TEST..BHE | 2009-10-01T12:46:01.000000Z - ... | 20.0 Hz, 801 samples
134-    .WET..HHZ  | 2010-01-01T01:01:05.999000Z - ... | 100.0 Hz, 4001 samples
135-    """
136:    fh = open(filename, 'rt')
137-    # read file and split text into channels
138-    channels = []
139-    headers = {}
--
387-            raise IOError(msg % data_file)
388-        fh_data = open(data_file, 'rb')
389-    # loop through read header file
390:    fh = open(filename, 'rt')
391-    line = fh.readline()
392-    cmtlines = int(line[5:7]) - 1
393-    # comment lines

io/gse2/paz.py
62-    zeros = []
63-
64-    if isinstance(paz_file, (str, native_str)):
65:        with open(paz_file, 'rt') as fh:
66-            paz = fh.readlines()
67-    else:
68-        paz = paz_file.readlines()

io/ndk/core.py
98-    if not hasattr(filename, "readline"):
99-        # Check if it exists, otherwise assume its a string.
100-        try:
101:            with open(filename, "rt") as fh:
102-                first_line = fh.readline()
103-        except:
104-            try:
--
156-    if not hasattr(filename, "read"):
157-        # Check if it exists, otherwise assume its a string.
158-        try:
159:            with open(filename, "rt") as fh:
160-                data = fh.read()
161-        except:
162-            try:

io/ascii/core.py
76-    True
77-    """
78-    try:
79:        with open(filename, 'rt') as f:
80-            temp = f.readline()
81-    except:
82-        return False
--
102-    True
103-    """
104-    try:
105:        with open(filename, 'rt') as f:
106-            temp = f.readline()
107-    except:
108-        return False
--
134-    >>> from obspy import read
135-    >>> st = read('/path/to/slist.ascii')
136-    """
137:    with open(filename, 'rt') as fh:
138-        # read file and split text into channels
139-        buf = []
140-        key = False
--
202-    >>> from obspy import read
203-    >>> st = read('/path/to/tspair.ascii')
204-    """
205:    with open(filename, 'rt') as fh:
206-        # read file and split text into channels
207-        buf = []
208-        key = False

clients/arclink/client.py
140-            dcid_key_file = DCID_KEY_FILE
141-        # parse dcid_key_file
142-        try:
143:            with open(dcid_key_file, 'rt') as fp:
144-                lines = fp.readlines()
145-        except:
146-            pass
jourdain commented 5 years ago

Run into exactly that issue inside a docker image using Python 2.7.

  File "/external/apps/py-env/lib/python2.7/site-packages/obspy/geodetics/flinnengdahl.py", line 41, in __init__
    with open(self.names_file, 'r') as fh:

Adding the encoding argument in each open(..., encoding="utf-8") calls of that file solved the issue for me.

megies commented 4 years ago

This probably is obsolete on master