roskakori / cutplace

validate data stored in CSV, PRN, ODS or Excel files
http://cutplace.readthedocs.org/
GNU Lesser General Public License v3.0
18 stars 20 forks source link

Add support for csv quoting option #117

Closed quapka closed 2 years ago

quapka commented 7 years ago

Recently I've ran into troubles, when validating files, that contained double quotes. I wanted to simply turn of the quoting, but could not find an option for that.

I've fixed it by adding another option for definition field in the CID file. You can just specify the quoting and it propagates down to the underlying csv.reader object as kwargs (in the same way as the other options. Usage:

Data format Value
D Quoting None

For the values you can specify the same things that csv.reader allows for quoting arg. Just get rid of the prefix QUOTE_.

I did not add tests (can add them), but I'm using it in production and it works nicely.

quapka commented 6 years ago

Hello, just a friendly reminder of this pull request, @roskakori . The build is failing, but it's not the first one in the row. I'm happy to help with that.

roskakori commented 2 years ago

Hello, just a friendly reminder of this pull request, @roskakori . The build is failing, but it's not the first one in the row. I'm happy to help with that.

A bit late, but thanks!

I just reworked the whole build process and will look into this PR now.

roskakori commented 2 years ago

For the record: The build fails because of some coveralls weirdness. I'm not gonna waste time on this yet and just merge it anyway.

quapka commented 2 years ago

A bit late, but thanks!

I just reworked the whole build process and will look into this PR now.

No probs, I'm glad that it made it to master after all :slightly_smiling_face:

roskakori commented 1 year ago

@quapka The quoting option is included in version 0.9.1, which is available form PyPI as of now.

I added documentation and tests with #143, you can check the diffs at PR #144.

However, the current implementation only supports all and minimal.

Supporting none would also require to support escape characters because this is what the Python csv module falls back to when a value includes the delimiter.

Supporting nonnumeric would not work as expected with decimal delimiter set to comma (,) as the Python csv module only supports a dot (.) as decimal delimiter. Also with nonnumeric the csv reader will return float values instead of strings, which might cause some unexpected behavior due to the general evilness of float. I did not look into this in detail, but I'm pretty sure that the nonnumeric features would have to be implemented on cutplace's side.