jazzband / tablib

Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
https://tablib.readthedocs.io/
MIT License
4.58k stars 589 forks source link

Support Mac OS LF char for csv #517

Closed matthewhegarty closed 2 years ago

matthewhegarty commented 2 years ago

Hi I'm testing some changes on django-import-export and I notice an issue with LF endings for csv data, which was the format on older mac os distributions.

>>> import tablib
>>> sb = 'id,name,author_email\r1,Some book,test@example.com\r'
>>> tablib.import_set(sb, format='csv')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/matthew/.virtualenvs/django-import-export/lib/python3.9/site-packages/tablib/core.py", line 908, in import_set
    return Dataset().load(normalize_input(stream), format, **kwargs)
  File "/home/matthew/.virtualenvs/django-import-export/lib/python3.9/site-packages/tablib/core.py", line 414, in load
    fmt.import_set(self, stream, **kwargs)
  File "/home/matthew/.virtualenvs/django-import-export/lib/python3.9/site-packages/tablib/formats/_csv.py", line 44, in import_set
    for i, row in enumerate(rows):
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

This can be fixed by adding the 'newline' kwarg to normalize_input():

def normalize_input(stream):
    """
    Accept either a str/bytes stream or a file-like object and always return a
    file-like object.
    """
    if isinstance(stream, str):
        return StringIO(stream, newline='')
    elif isinstance(stream, bytes):
        return BytesIO(stream)
    return stream
matthewhegarty commented 2 years ago

@claudep Is it possible we could get a new release with this fix in? We would need it for our forthcoming v3 release of django-import-export. I also notice that there was a build failure. Could you re-run the job to test whether it is something we need to fix?

hugovk commented 2 years ago

Yes, it's about time for a release. We could wait a day or two to see if we can get https://github.com/jazzband/tablib/pull/516 in?

What's your schedule for django-import-export?

I don't want to delay if you're ready to go now, we can always release this now and make a new release when that's ready, it's pretty easy to release with the automation.


The build is too old to restart, but here it is passing on my fork:

https://github.com/hugovk/tablib/actions/runs/2076613414

And here's a PR to add a button to allow us to manually trigger builds in the future: https://github.com/jazzband/tablib/pull/519.

matthewhegarty commented 2 years ago

What's your schedule for django-import-export?

No rush, this is for our major release which has been pending for a while, so we can certainly wait until you are ready.

hugovk commented 2 years ago

516 is still in progress, so I'll make a release today.

hugovk commented 2 years ago

Released in 3.2.1! 🚀