mozman / dxfgrabber

Outdated DXF reader, please switch to ezdxf (UNMAINTAINED)
Other
66 stars 15 forks source link

Line endings #2

Closed BebeSparkelSparkel closed 7 years ago

BebeSparkelSparkel commented 7 years ago

Hi mozman,

I'm having continued problems with line endings. Your tool imports from file just fine but if i start using a stream from a database your tool fails to import because the lines end with a "\r\n" and not just a "\n".

Is it possible for your tool to also accept that type of line ending for the dxfgrabber.read method?

Thanks

mozman commented 7 years ago

The stream object should produce universal newlines, which all text streams in python do.

BebeSparkelSparkel commented 7 years ago

It seems that when inserted than retrieved from the database the \r is not removed like it would be if being streamed from file. Currently I've resolved to using a file_string.replace('\r', '') but that seems messy.

mozman commented 7 years ago

This is also the only solution I could do and it would degrade the performance for all usecases. For me it is better you find a solution for the line ending problem in your data processing workflow :).

BebeSparkelSparkel commented 7 years ago

Every line has the \r if we could do a check on the first line we would be able to tell with little overhead if the stream has the character. If it does then it could be removed in all the following lines.

mozman commented 7 years ago

First not all streams support seek(), so read ahead requires additional work (buffering). And this is a special case, which can be treated outside of dxfgrabber like:

class FixLineEndingStream:
    def __init__(self, stream):
        self._stream = stream

    def readline(self):
        line = self._stream.readline()
        return line if not line.endswith('\r\n') else line[:-2] + '\n'

fixed_stream = FixLineEndingStream(old_stream)

dxfgrabber only requires the readline() method.

mozman commented 7 years ago

A more efficient solution but not tested:

class FixLineEndingStream:
    def __init__(self, stream):
        self._stream = stream
        self.readline = self._init_readline

    def _init_readline(self):
        line = self._stream.readline()
        if line.endswith('\r\n'):
            self.readline = self._readline_win()
            return line[:-2] + '\n'
        else:
            self.readline = self._stream.readline
            return line

    def _readline_win(self):
        return self._stream.readline()[:-2] + '\n'

fixed_stream = FixLineEndingStream(old_stream)

No overhead if line ending is '\n', calls direct old_stream.readline(), this is a solution I can add to dxfgrabber, if it is tested.

mozman commented 7 years ago

Fixed this issue in v0.8.3 the easy way: .rstrip('\r\n')

My stress test suite now takes 104 sec instead of 101 sec for reading 80 files (+3%).

BebeSparkelSparkel commented 7 years ago

That's awesome thanks a lot!