levelgraph / levelgraph

Graph database JS style for Node.js and the Browser. Built upon LevelUp and LevelDB.
MIT License
1.47k stars 121 forks source link

How to convert CSV data for levelgraph? #29

Closed pe3 closed 11 years ago

pe3 commented 11 years ago

I'm quite new to semantic technologies. I would want to test the database with a dataset from the Finnish tax authorities. It is about taxes paid by companies. How should I go about to transform the data to triplets? Is there some CSV2RDF tool I could use so that levelgraph-n3 would understand it. I just found Rasqal but didn't test it.

mcollina commented 11 years ago

The problem is data modelling. Modelling the data as RDF in a graph format is something that requires some user involvement. However plenty has been written on the topic (hint: search google for "csv to rdf"). Have a look here: http://www.w3.org/wiki/ConverterToRdf. Once you have the data into RDF, you can save it in N3/turtle format and import it with levelgraph-n3.

However, you can simply parse the csv with node.js and store it in levelgraph as follows:

  1. get a unique identifier for each row (from the fields, I don't understand Finnish).
  2. store a triple for each cell in each rows, using the column name as predicate and the value as object.
  3. query it with the normal levelgraph methods.

If in the process you add a Wiki page, I will be extremely grateful :).

mcollina commented 11 years ago

BTW, storing it as RDF can be simplified by using URI as identifiers for subjects, predicates and objects (where approriate, i.e. when the object are not values, but links to other entities).

pe3 commented 11 years ago

Thanks. Will do at some point. For the weeks to come I will be traveling and enjoying summer. But when I do I will also have a look at OpenRefine and LODRefine.

mcollina commented 11 years ago

Ok, I'm closing this! Ping me if you need more help!

pietercolpaert commented 10 years ago

Hi Petri @pe3,

What did you end up using? Did you find a mapping system for node?

We've been working on tools like Karma, The DataTank and LODRefine. You can also read a paper about extending R2RML to RML over here