semsol / arc2

ARC RDF Classes for PHP
Other
332 stars 90 forks source link

Direct RDF import into database #146

Open treb93 opened 4 years ago

treb93 commented 4 years ago

Hello everyone,

I'm searching for a function which allows to import directly rdf data into the database, without going through a SPARQL request.

For now I reached to do it by using ARC2_StoreLoadQueryHandler like this :

require(__DIR__."/config.php");

$store = ARC2::getStore($config);

if (!$store->isSetUp()) {
    $store->setUp();
}

$data = file_get_contents("../../dambri.rdf"); // Should also be received from some RESTful API

$loadHandler = new ARC2_StoreLoadQueryHandler($config, $store);

$loadHandler->runQuery([], $data);

Is there some built-in function that is doing the same operation ? If not, could it be a good idea to include it in the project ? (I can eventually take in charge the devs with some guidelines)

Best regards

k00ni commented 4 years ago

What is the reason to avoid SPARUL queries? Performance?

Although it might be difficult, is it an option to use PDO and directly write into the database?

CC @semsol

treb93 commented 4 years ago

The idea is to be able to post rdf data through an endpoint, and save it directly to the database. By the way it might be an option to use PDO, also maybe the code in ARC2_StoreRDFXMLLoader could be a good starting point ?

k00ni commented 4 years ago

I am not that familiar with ARC2_StoreRDFXMLLoader, but its method addT seems promising.

treb93 commented 4 years ago

Alright, do you want me to propose some class name / structure to include in the project ? Or develop it as a plugin ?

Btw have you think to add comments on the database fields ? For now the one single letter names are quite esoteric for a newbie like me ;) I can help too on some documentation effort in this direction

k00ni commented 4 years ago

Hi, sorry for the late response.

Btw have you think to add comments on the database fields ?

Not yet, but you can suggest something as PR and we can discuss details there.

For now the one single letter names are quite esoteric for a newbie like me ;) I can help too on some documentation effort in this direction

Yeah, that makes it sometimes hard to understand what is going on. Efforts regarding documentation in general should be done as file and not in the Wiki, if possible. Makes it easier to keep track of changes.

Alright, do you want me to propose some class name / structure to include in the project? Or develop it as a plugin ?

I am not sure what do you want to achieve for your project. Code contributions are welcome in general, but new code must have test coverage to some extend. Also some sound documentation.

From a performance perspective: ARC2 doesn't scale very good, which means you will eventually run into problems. It depends on your data, there is no definite point. AFAIK it doesn't matter how fast you can include data in the store, at some point querying it will be slow no matter what (also depended on your data). When I wrote the adapter layer to allow mysqli and PDO, I saw a lot of MySQL 4 related "optimizations". Also table type MyISAM was used for a long time, which is not fully ACID compliant (good stackoverflow post about MyISAM vs. InnoDb: https://stackoverflow.com/a/15678615/5301527). Using a stand alone solution like Virutoso might worth a shot.

If you describe your use case in a little more detail, I may help. But in general I would say it doesn't worth the effort. Thank you for suggesting it though!