wbolster / jsonlines

python library to simplify working with jsonlines and ndjson data
https://jsonlines.readthedocs.org/
Other
274 stars 31 forks source link

JSONLines form list of jsonable python objects? #66

Open azmeuk opened 3 years ago

azmeuk commented 3 years ago

Hi. First of all thank you for your work on this library.

Reading the documentation I understand that the usage suggested by this library is to open a .jsonl file with jsonlines.open, or having a list of JSON encoded strings, and then handle the content with utilities.

My usecase would be to take a list (or a generator) of arbitrary jsonable python objects, then return a generator of jsonlines. This behavior does not seem to be supported right now:

>>> import jsonlines
>>> objects = ({"foo": "bar"} for _ in range(100))
>>> reader = jsonlines.Reader(objects)
>>> list(reader)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/azmeuk/dev/yaal/lum1/local.virtualenv/lib/python3.9/site-packages/jsonlines/jsonlines.py", line 203, in iter
    yield self.read(
  File "/home/azmeuk/dev/yaal/lum1/local.virtualenv/lib/python3.9/site-packages/jsonlines/jsonlines.py", line 162, in read
    value = self._loads(line)
  File "/usr/lib/python3.9/json/__init__.py", line 339, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not dict

the JSON object must be str, bytes or bytearray, not dict

I would have imagined something like this:

>>> objects = ({"foo": "bar"} for _ in range(100))
>>> for line in jsonlines.somefunction(objects):
...     print(line)
{'foo': 'bar'}
{'foo': 'bar'}
...

This would be very convenient to stream huge jsonlines quantities in a Flask response for instance:

return flask.Response(jsonlines.somefunction(objects), mimetype="application/jsonl")

I suggest implementing such a feature on this library. I volunteer to provide a patch for this if you are not interested in doing it yourself.

What do you think? Do you have preferences on how you would like this to be implemented?

wbolster commented 3 years ago

i see the use case for streaming output. let me think a bit about what would make sense.

wbolster commented 3 years ago

ok, let's first get the use case(s) clear before resorting to coding…

i don't think a streaming api makes sense as part of the current jsonlines.Writer class since that always wraps a file-like object, which is not applicable to the generator version.

another option would be a separate function that resembles the configuration options that Writer offers, e.g. something like iter_write(compact:bool=..., sort_keys:bool=...) -> Iterator[...]. should it return an Iterator[str] or Iterator[bytes]? or configurable via either an argument or a second function? :neutral_face:

would the above make sense for the streaming http response use case?

do you have additional ideas? :thinking:

azmeuk commented 3 years ago

Thank you for your interest in the subject.

another option would be a separate function that resembles the configuration options that Writer offers, e.g. something like iter_write(compact:bool=..., sort_keys:bool=...) -> Iterator[...]. should it return an Iterator[str] or Iterator[bytes]? or configurable via either an argument or a second function? neutral_face

I think we can keep to str. werkzeug handle this very well, it is more easy to handle for debugging purposes

would the above make sense for the streaming http response use case?

Yep totally. As long as the function can take a generator as an argument, and yield jsonlines one at a time I think it would be perfect.

azmeuk commented 2 years ago

Hi @wbolster. Do you intend to tackle this patch or do you prefer someone in the community to do it?

wbolster commented 2 years ago

feel free to work on this, but please come up with a concrete API proposal first 🙏🏼. i am onboard with adding this feature, but not yet convinced on API details.