tarql / tarql

SPARQL for Tables: Turn CSV into RDF using SPARQL syntax
http://tarql.github.io/
BSD 2-Clause "Simplified" License
192 stars 63 forks source link

Interpretation problem of Coma in a field #7

Closed pyvandenbussche closed 10 years ago

pyvandenbussche commented 11 years ago

For instance, the following example: "Kips Bay Medical, Inc.",0001460198,3841,AYOKZPWQTYDPHIZQ4N06,KIPS Will cause some troubles.

cygri commented 11 years ago

What do you expect to happen and what do you see happening?

pyvandenbussche commented 11 years ago

**Input line: "Kips Bay Medical, Inc.",0001460198,3841,AYOKZPWQTYDPHIZQ4N06,KIPS

**expected split: Kips Bay Medical, Inc. 0001460198 3841 AYOKZPWQTYDPHIZQ4N06 KIPS

**current split: Kips Bay Medical Inc.",0001460198,3841,AYOKZPWQTYDPHIZQ4N06,KIPS [...until you have another starting quote...]

Not sure to understand the RFC http://tools.ietf.org/html/rfc4180

I see three explanations: 1- When you start using quotes to escape field content, you have to use quotes systematically for all fields 2- We should have a double quotes to escape properly a field 3- The program don't handle properly the quote escaping

cygri commented 11 years ago

I agree with your expected split. So it seems we have your explanation number three. Which is weird, as we use a mature off-the-shelf CSV parser, and I would expect it to handle this correctly.

I will dig into this when I'm back in the office (Friday).

cygri commented 11 years ago

I cannot reproduce this.

This is test.csv:

"Kips Bay Medical, Inc.",0001460198,3841,AYOKZPWQTYDPHIZQ4N06,KIPS

This is test.sparql:

CONSTRUCT {}
FROM <test.csv>
WHERE {}

This is the result of running bin/tarql --test test.sparql:


{ }
--------------------------------------------------------------------------------------
| a                        | b            | c      | d                      | e      |
======================================================================================
| "Kips Bay Medical, Inc." | "0001460198" | "3841" | "AYOKZPWQTYDPHIZQ4N06" | "KIPS" |
--------------------------------------------------------------------------------------

Can you share the actual file and mapping with me (via email)?

cygri commented 10 years ago

Can't reproduce.