jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.41k stars 74 forks source link

CSV support #488

Open provinzkraut opened 1 year ago

provinzkraut commented 1 year ago

Description

How do you feel about adding CSV support, similar to what's provided in the yaml and toml modules?

It's easy enough to implement yourself, but I feel like it would be nice to have (=

jcrist commented 1 year ago

My gut reaction is I don't think this feature makes sense in msgspec. Unlike the other builtin formats, csvs are not really standardized, which is why pandas.read_csv has a whopping 49 configuration parameters.

Also, since CSVs are tabular in nature I'd recommend using one of python's many existing tabular apis (pandas, polars, pyarrow, ...). These native representations avoid creating a PyObject per element, and can be much more efficient than anything I'd implement in msgspec. I'm biased here though - my day job is doing python data stuff, I'm less familiar with what web engineers need.

I am curious though - what's your use case for mixing msgspec & csvs? If it's "easy enough to implement yourself", would you be open to contributing an example to the examples directory showing how this would work?

provinzkraut commented 1 year ago

My gut reaction is I don't think this feature makes sense in msgspec. Unlike the other builtin formats, csvs are not really standardized, which is why pandas.read_csv has a whopping 49 configuration parameters.

My suggestion would have been to simply base this on what the standard library provides. Its csv module is solid, but not as configurable as others.

Also, since CSVs are tabular in nature I'd recommend using one of python's many existing tabular apis (pandas, polars, pyarrow, ...). These native representations avoid creating a PyObject per element, and can be much more efficient than anything I'd implement in msgspec.

I imagined it to be more for convenience than performance. Similar to how tomli is wrapped by the msgspec.toml module, instead of building a toml parser into msgspec (=

I am curious though - what's your use case for mixing msgspec & csvs?

Twofold.

  1. I am dealing with a legacy API that returns CSV for some endpoints. Since the client for this API is now (almost entirely) using msgspec.Struct and dataclasses, it would be nice if I could just msgspec.csv.decode(<raw csv data>, type=list[RowModel]). Right now I've added a wrapper that achieves this using the standard library's csv module
  2. I've got some data that exists as msgspec.Struct already and I need to turn it into CSV. Same as before, I've added a wrapper around csv, but having msgspec.csv.encode(<list of structs>) work would be nice

If it's "easy enough to implement yourself", would you be open to contributing an example to the examples directory showing how this would work?

Sure thing! If you say this isn't something that should be part of msgspec I could contribute my stuff as an example, otherwise I'd also be open to implement support for it in a similar fashion to the yaml and toml submodules in case you'd want to go for that (=

jcrist commented 1 year ago

Thanks for the extra info. I think for now I'd like to leave this out of msgspec proper. If another user asks for it we can always add it then, but it's harder to remove support for something later.

If you have the time, I'd love an example in the examples directory showing how to integrate msgspec with the csv module. It'd be a nice example of using msgspec.convert/msgspec.to_builtins to support a new protocol.

provinzkraut commented 1 year ago

If you have the time, I'd love an example in the examples directory showing how to integrate msgspec with the csv module. It'd be a nice example of using msgspec.convert/msgspec.to_builtins to support a new protocol.

Will do!

clintval commented 1 week ago

I have a use case where I'd also like to write to CSV/TSV and have built a wrapper to do so. @provinzkraut did you end up building an example? I'd love to see one for comparison sake!