Closed BebeSparkelSparkel closed 7 years ago
The stream object should produce universal newlines, which all text streams in python do.
It seems that when inserted than retrieved from the database the \r is not removed like it would be if being streamed from file. Currently I've resolved to using a file_string.replace('\r', '') but that seems messy.
This is also the only solution I could do and it would degrade the performance for all usecases. For me it is better you find a solution for the line ending problem in your data processing workflow :).
Every line has the \r if we could do a check on the first line we would be able to tell with little overhead if the stream has the character. If it does then it could be removed in all the following lines.
First not all streams support seek()
, so read ahead requires additional work (buffering).
And this is a special case, which can be treated outside of dxfgrabber like:
class FixLineEndingStream:
def __init__(self, stream):
self._stream = stream
def readline(self):
line = self._stream.readline()
return line if not line.endswith('\r\n') else line[:-2] + '\n'
fixed_stream = FixLineEndingStream(old_stream)
dxfgrabber only requires the readline()
method.
A more efficient solution but not tested:
class FixLineEndingStream:
def __init__(self, stream):
self._stream = stream
self.readline = self._init_readline
def _init_readline(self):
line = self._stream.readline()
if line.endswith('\r\n'):
self.readline = self._readline_win()
return line[:-2] + '\n'
else:
self.readline = self._stream.readline
return line
def _readline_win(self):
return self._stream.readline()[:-2] + '\n'
fixed_stream = FixLineEndingStream(old_stream)
No overhead if line ending is '\n', calls direct old_stream.readline()
, this is a solution I can add to dxfgrabber, if it is tested.
Fixed this issue in v0.8.3 the easy way: .rstrip('\r\n')
My stress test suite now takes 104 sec instead of 101 sec for reading 80 files (+3%).
That's awesome thanks a lot!
Hi mozman,
I'm having continued problems with line endings. Your tool imports from file just fine but if i start using a stream from a database your tool fails to import because the lines end with a "\r\n" and not just a "\n".
Is it possible for your tool to also accept that type of line ending for the dxfgrabber.read method?
Thanks