jina-ai / jina

☁️ Build multimodal AI applications with cloud-native stack
https://docs.jina.ai
Apache License 2.0
20.56k stars 2.21k forks source link

fix(sagemaker): read csv with escaped chars #6102

Closed deepankarm closed 8 months ago

deepankarm commented 8 months ago

Goals:

We faced an issue with batch transform on sagemaker where parsing for text with escape chars (e.g. :point_down:). This PR fixes the way csv is parsed and add tests for complex input.

1,abcd
2,efgh\, with comma
3,ijkl with \"quote\"
4,mn\\nop with newline
5,qrst with \\ backslash
6,uvwx with both\, comma and \"quote\"
7,yzab with newline\\nand comma\,
8,cde\"f with embedded quote
9,ghij with special char #
10,klmn with everything\, \"quote\" \\backslash and \\nnewline

codecov[bot] commented 8 months ago

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (ab2cc19) 76.04% compared to head (b91c622) 76.88%. Report is 2 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #6102 +/- ## ========================================== + Coverage 76.04% 76.88% +0.83% ========================================== Files 145 145 Lines 14014 14015 +1 ========================================== + Hits 10657 10775 +118 + Misses 3357 3240 -117 ``` | [Flag](https://app.codecov.io/gh/jina-ai/jina/pull/6102/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai) | Coverage Δ | | |---|---|---| | [jina](https://app.codecov.io/gh/jina-ai/jina/pull/6102/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai) | `76.88% <16.66%> (+0.83%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/jina-ai/jina/pull/6102?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai) | Coverage Δ | | |---|---|---| | [jina/\_\_init\_\_.py](https://app.codecov.io/gh/jina-ai/jina/pull/6102?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai#diff-amluYS9fX2luaXRfXy5weQ==) | `56.00% <100.00%> (ø)` | | | [jina/serve/runtimes/worker/http\_sagemaker\_app.py](https://app.codecov.io/gh/jina-ai/jina/pull/6102?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai#diff-amluYS9zZXJ2ZS9ydW50aW1lcy93b3JrZXIvaHR0cF9zYWdlbWFrZXJfYXBwLnB5) | `0.00% <0.00%> (ø)` | | ... and [18 files with indirect coverage changes](https://app.codecov.io/gh/jina-ai/jina/pull/6102/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.