Closed thoo1 closed 4 years ago
Comment: World.tsv ended up changing quite a bit. As I'm new to this project I don't know if that is expected, could the reviewer run generate_data.py on their own machine to see? Thanks
Thanks for reporting this problem. While we just changed things with the .tsvs, the opening and writing did not really get changed.
I can't reproduce your issue, as I don't have a Windows system to test. As far as I understand, our open() calls should actually tolerate all OS' versions of newline chars, and translate to '\n'. So I am not sure where your issues came from.
I agree that we might want to fix the newline character for the git repo via .gitattributes to avoid .csv files with mixed line endings, or apply your fix to write_tsv(). Your fix forces '\n' as newline for the new .csv files.
@rneher Do you have objections/ experience with globally enforcing a standard newline char in .gitattributes? If we only update the open() in write_tsv, manual edits of .tsv files might still lead to inconsistent line ending characters.
In any case, this PR cannot be merged at the moment due to our .tsv changes. As the change is minor, we can just directly commit it if we agree on how to proceed.
I don't have experience with the newline issue. I'll ask a colleague on windows to look and maybe poke around a bit for recommendations.
We are no longer going to keep this as a submodule but rather just keep the data branch within the main repo. I invite you to resubmit this PR there.
I tried to clone repo and run all parsers on windows and ran into some indexError: list index out of range error while parsing World.tsv. I noticed the run also added newlines Turns out this is some bug: https://stackoverflow.com/questions/16271236/python-3-3-csv-writer-writes-extra-blank-rows Adding parameter newline='' will fix it