tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.6k stars 3.51k forks source link

Quote and single quote are not handled correctly in vocab file where words are not wrapped in quotes #1862

Open hepaajan opened 4 years ago

hepaajan commented 4 years ago

Especially following branch will remove the quote so that it becomes empty string (as single quote character starts and ends with quote):

https://github.com/tensorflow/tensor2tensor/blob/5f9dd2db6d7797162e53adf152310ed13e9fc711/tensor2tensor/data_generators/text_encoder.py#L929

easy fix is the check also that "len(s) > 1" in both conditions