openfoodfacts / taxonomy-editor

Taxonomies are at the heart of Open Food Facts data structure - this project provides an editor
https://wiki.openfoodfacts.org/Taxonomy_editor
GNU Affero General Public License v3.0
18 stars 22 forks source link

Handle \r\n or \n\r in files at parser level #386

Open alexgarel opened 9 months ago

alexgarel commented 9 months ago

In parser we should be able to handle when a file as \r\n and \n\r line endings to treat it as if it had \n endings only.

The output (unparse) should however always using \n only.

eric-nguyen-cs commented 9 months ago

This is already the case I believe (at least on staging or prod)? According to the open function documentation: On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

On output, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

Thus, as we read the file with open(filename, "r", encoding="utf8") we translate \r\n into \n and we write back \n in the unparser. We may want to update the open(filename, "w", encoding="utf8") of the unparser to handle the case where the code is run on a Windows machine for dev environments