herminiogg / ShExML

A heterogeneous data mapping language based on Shape Expressions
http://shexml.herminiogarcia.com
MIT License
15 stars 2 forks source link

Using xls files as SOURCE #146

Open andrawaag opened 12 months ago

andrawaag commented 12 months ago

Currently, when converting Excel sheets, I convert the individual sheets to CSV files before declaring them as SOURCES. I suspect I can use JDBC instead to source an Excel file in a ShExML directly. Is this an accurate assumption, and if so, would you have an example of how to link an Excel file?

herminiogg commented 11 months ago

Hi Andra,

I noticed I left this issue unanswered. This is something that I have already thought about in the past and that other similar languages seem to have similar issues with it. As I see it, the problem resides in the fact that an Excel file is not exactly equal to a CSV/TSV file because the Excel file can have a much more complex structure. Therefore, implementing a support for it would mean providing a mechanism to traverse such complex structure with enough flexibility. As many times people just use Excel files to host tables then the easiest solution is to convert them to CSV which I notice is a small hiccup but not too troublesome in my opinion. If you want to use JDBC for the connection you need to have in mind two things. Firstly, only these drivers are integrated in the library due to license restrictions: MySQL, SQLite, PostgreSQL, MariaDB and SQLServer. So an example of consulting one of these databases will be as follows:

PREFIX : <http://example.com/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX schema: <http://schema.org/>
PREFIX xs: <http://www.w3.org/2001/XMLSchema#>
SOURCE films_database <jdbc:mysql://localhost:3306/films>
ITERATOR films_iterator <sql: SELECT * FROM films;> {
    FIELD id <id>
    FIELD name <name>
    FIELD year <year>
    FIELD country <country>
    FIELD director <director>
}

EXPRESSION films <films_database.films_iterator>

:Films :[films.id] {
    :type :Film ;
    :name [films.name] @en ;
    :year [films.year] xs:gYear ;
    :country [films.country] ;
    :director [films.director] ;
}

Then, if you need to use a JDBC driver which is not included in the library you will need to add the dependencies to the classpath during the library invocation. For example I have used it to query Access databases with the UCanAccess driver and the call should start like this:

$ java -cp "ShExML-v0.3.3.jar;ucanaccess\ucanaccess-5.0.1.jar;ucanaccess\lib\*" com.herminiogarcia.shexml.Main -d jdbc:ucanaccess%net.ucanaccess.jdbc.UcanaccessDriver

Best, Herminio