lidingpku / data-gov-wiki

Automatically exported from code.google.com/p/data-gov-wiki
0 stars 0 forks source link

a bug in csv2rdf4lod enhancement #8

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What is the expected output? What do you see instead?

I was checking
/work/data-gov/v2010/csv2rdf4lod/data/source/nci-nih-gov/tobacco-law-coverage/ve
rsion/2010-Aug-25/automatic/us-state-policy.csv.e1.ttl

and I found  value be 0.0 on 2009 for Alabama.  But the raw xsl file
said that is "11%" instead. can you check why that happened?

tobacco-law-coverage-us-state-policy:thing_2_47 dcterms:isReferencedBy
<http://logd.t
w.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/version/2010-Aug-25> ;
       e1:state
<http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-cover
age/us-state-policy/value-of/state/1> ;
       e1:cnt_staten "Alabama" ;
       e1:sum_cnty_p "0" ;
       e1:min_state "0" ;
       e1:min_state1 "0" ;
       e1:year "2009"^^xsd:gYear ;
       e1:venue "Restaurants" ;
       rdf:value "0.0"^^xsd:decimal ;
       ov:csvRow "2"^^xsd:integer ;
       ov:csvCol "47"^^xsd:integer ;
       ov:subjectDiscriminator
<http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/discrimi
nator/us-state-policy>
.

Original issue reported on code.google.com by liding...@gmail.com on 31 Aug 2010 at 5:27

GoogleCodeExporter commented 9 years ago

Original comment by tim...@gmail.com on 14 Sep 2010 at 7:18

GoogleCodeExporter commented 9 years ago
(running notes)

/work/data-gov/v2010/csv2rdf4lod/data/source/nci-nih-gov/tobacco-law-coverage/ve
rsion/2010-Aug-25/manual/us-state-policy.csv:

$ cat manual/us-state-policy.csv | grep "STATE" | awk -F, '{print 
$38,$39,$40,$41,$42,$43,$44}'
REST04 REST05 REST06 REST07 REST08 REST09 BAR90

$ cat manual/us-state-policy.csv | grep "Alab" | awk -F, '{print 
$38,$39,$40,$41,$42,$43,$44}'
0 6 8 11 11 11 0

REST09 seems to correspond with 11 at column column 43 (NOT 47, as indicated in 
bug report)

25 Aug version is using global params created from 09 Aug version:
      conversion:enhance [
         ov:csvCol         47;
         ov:csvHeader     "Sum_REST09";
         a scovo:Item;                                               # :
         conversion:label  "Year";                                   # : was "Sum_REST09";   (this is $43 in 25 Aug version)
         conversion:object "2009"^^xsd:gYear;                            # :
         conversion:range  xsd:decimal;                              # : was todo:Literal;
      ];

looking a few columns back, we find our 11:

tobacco-law-coverage-us-state-policy:thing_2_43 dcterms:isReferencedBy 
<http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/version/
2010-Aug-25> ;
   e1:state <http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/us-state-policy/value-of/state/1> ;
   e1:cnt_staten "Alabama" ;
   e1:sum_cnty_p "0" ;
   e1:min_state "0" ;
   e1:min_state1 "0" ;
   e1:year "2005"^^xsd:gYear ;
   e1:venue "Restaurants" ;
   rdf:value "11.0"^^xsd:decimal ;
   ov:csvRow "2"^^xsd:integer ;
   ov:csvCol "43"^^xsd:integer ;
   ov:subjectDiscriminator <http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/discriminator/us-state-policy> .

The enhancement parameter specified the incorrect interpretation of "2009" for 
column 47 when it should have been on column 43.

CSVtoRDF.java is not to blame. The assumption that the same parameters applied 
to the subsequent version is to blame. That does not fully absolve the global 
parameters apparatus, though. see 
http://code.google.com/p/data-gov-wiki/issues/detail?id=14

Original comment by tim...@gmail.com on 14 Sep 2010 at 9:39

GoogleCodeExporter commented 9 years ago
closing

Original comment by tim...@gmail.com on 14 Sep 2010 at 9:40