huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
18.65k stars 2.56k forks source link

CI fails on Windows for test_delete_from_hub and test_xgetsize_private due to new-line character #6856

Closed albertvillanova closed 1 month ago

albertvillanova commented 1 month ago

CI fails on Windows for test_delete_from_hub after the merge of:

This is weird because the CI was green in the PR branch before merging to main.

FAILED tests/test_hub.py::test_delete_from_hub - AssertionError: assert [CommitOperat...\r\n---\r\n')] == [CommitOperat...in/*\n---\n')]

  At index 1 diff: CommitOperationAdd(path_in_repo='README.md', path_or_fileobj=b'---\r\nconfigs:\r\n- config_name: cats\r\n  data_files:\r\n  - split: train\r\n    path: cats/train/*\r\n---\r\n') != CommitOperationAdd(path_in_repo='README.md', path_or_fileobj=b'---\nconfigs:\n- config_name: cats\n  data_files:\n  - split: train\n    path: cats/train/*\n---\n')

  Full diff:
    [
        CommitOperationDelete(
            path_in_repo='dogs/train/0000.csv',
            is_folder=False,
        ),
        CommitOperationAdd(
            path_in_repo='README.md',
  -         path_or_fileobj=b'---\nconfigs:\n- config_name: cats\n  data_files:\n '
  ?                                                                       --------
  +         path_or_fileobj=b'---\r\nconfigs:\r\n- config_name: cats\r\n  data_f'
  ?                               ++          ++                     ++
  -                         b' - split: train\n    path: cats/train/*\n---\n',
  ?                                                                   ^^^^^^ -
  +                         b'iles:\r\n  - split: train\r\n    path: cats/train/*\r'
  ?                           ++++++++++                ++                        ^
  +                         b'\n---\r\n',
        ),
    ]
albertvillanova commented 1 month ago

After investigation, I have found that when a local file is uploaded to the Hub, the new line character is no longer transformed to "\n": on Windows machine now it is kept as "\r\n".

Any idea why this changed? CC: @lhoestq