Closed emjotde closed 6 years ago
I believe this fixes it. The problem is that for empty lines leading_whitespace
and trailing_whitespace
overlap. The length check in the first if
should fix that.
def process_line(self, line):
"""segment line, dealing with leading and trailing whitespace"""
out = ""
leading_whitespace = len(line)-len(line.lstrip())
if leading_whitespace and len(line.lstrip()):
out += line[:leading_whitespace]
out += self.segment(line)
trailing_whitespace = len(line)-len(line.rstrip())
if trailing_whitespace:
out += line[-trailing_whitespace:]
return out
thanks; fixed.
Hi, it seems
apply_bpe.py
duplicates empty lines, minimal example:and twice as many with the script.
Can you reproduce this?