Modular storage layout - Githubissues

GoogleCodeExporter commented 9 years ago

We should make CumulusRDF storage-unaware as much as possible. 
This could allow us 

- to use different kind of (presumably NoSql) storages
- to use differnt version of the same storage (e.g. Cassandra 1.2.x or 
Cassandra 2.0) in the same branch

The first thing to do is to separate the data access layer between interfaces 
(they will remain in core module) and implementations in separate modules. 
At the end of this first phase we will have two additional modules.

- cumulusrdf-cassandra-1.2.x 
- cumulusrdf-cassadra-2.x

After that, the project will be extensible from that point of view (it will be 
just a matter of adding additional modules -  e.g. cumulusrdf-hbase, 
cumulusrd-accumulo, cumulusrdf-solr)

Original issue reported on code.google.com by a.gazzarini@gmail.com on 19 Apr 2014 at 5:33

GoogleCodeExporter commented 9 years ago

I really like this idea. I mostly because it 

(a) opens our project up to a much larger range of users
(b) it allows other developers to "plugin" there storage implementations and 
"have a complete SesameSail" on top of there favorite storage ...

I think we should have a look at the SesameSail interface package [1]. These 
guys did an amazing job on making an "easy-to-extend" storage backend. We 
should have a very close look and try to do it in a similar fashion. Maybe even 
extend interface/classes?

The key thing is, that our cumulusRDF-core module becomes:
(1) a well-documented/easy-to-understand collection of interface and abstract 
classes
(2) we provide generic interface/abstract classes, which can be implemented by 
many different storage solutions.

What do you guys think?

Kind regards
Andreas

[1] http://openrdf.callimachus.net/sesame/2.7/apidocs/index.html

Original comment by andreas.josef.wagner on 20 Apr 2014 at 2:40

GoogleCodeExporter commented 9 years ago

Original comment by a.gazzarini@gmail.com on 2 May 2014 at 10:30

Changed state: Started

GoogleCodeExporter commented 9 years ago

Hi guys,
I have a lot of good news. Modular storage is basically completed, just a minor 
thing about configurability, but I'm working on that.

Shortly, the branch 1.1.0 is able to install and test CumulusRDF with a 
pluggable storage; for doing that it uses maven profiles so for example

        mvn clean install -Pcassandra12x-hector-full-tp-index: build, test and install CumulusRDF with Cassandra 1.2.x(16) with Hector Support
        mvn clean install -Pcassandra12x-cql-full-tp-index: build, test and install CumulusRDF with Cassandra 1.2.x(16) with CQL Support (Not yet implemented of course)
        mvn clean install -Pcassandra2x-cql-full-tp-index: build, test and install CumulusRDF with Cassandra 2.x with CQL Support (Not yet implemented of course)

I abstracted the concept of Embedded storage so the test stuff (specifically 
what I called "test framrwork" + the Test suite) expects to find a class that

    starts the storage server before all tests and
    stops the storage server after  all tests

At the moment there's only one implementation of such class which uses 
Farsandra to have a running Cassandra during the tests. That class is really an 
interface (StorageRunner) and the concrete implementation is defined in the 
specific maven profile. That will give us to run the appropriate storage server 
according with a given profile. Example

    Run Farsandra 1.2.16 with profile cassandra12x-hector-full-tp-index
    Run MiniCluster (HBase) X.Y.Z with profile hbase-xyz-full-tp-index
    ...
    Run Farsandra 2.x with profile cassandra2x-cql-full-tp-index

If you want, under 
/home/agazzarini/workspaces/cumulus-1.1.0/cumulusrdf/src/site/dev I created 
three eclipse launchers; they are simple xml files but if you right click on 
them, Eclipse will got them as "Maven Run configurations".
*** CHANGES TO PROJECT LAYOUT ***

The current 1.1.0 is organized like this (BTW, I strongly suggest to checkout 
the whole 1.1.0 again, otherwise you should import all missing modules)

    CumulusRDF: Top Level Project
        CumulusRDF: Framework
        CumulusRDF: Test Framework
        CumulusRDF: Pluggable Storages
            CumulusRDF: Cassandra 1.2.x (using Hector) based full triple pattern index
            CumulusRDF: Cassandra 1.2.x (using CQL) based full triple pattern index
            CumulusRDF: Cassandra 2.x (using CQL) based full triple pattern index
        CumulusRDF: Core module
        CumulusRDF: Web module
        CumulusRDF: Standalone c/s module
        CumulusRDF: Integration tests module
        CumulusRDF: Benchmarks module

I created a framework module first because classes that are there have to be 
considered at a differnet level from core classes; second, putting all in core 
module, I had problems with circulary dependencies (core --> modular-storage 
--> core --> ...). So at the moment we have the following dependency chain: 
cumulusrdf-core --> cumulusrdf-framework <-- cumulus-pluggable-storages

In addition, I created a test framework module for defining an interface + 
concrete implementations that run the embedded server we need (according with 
chosen profile). So, as told you above, at the moment these classes take care 
about running and stopping Farsandra before and after the whole test suite.

*** CHANGES IN TEST SUITE ***

In order to have a suite setup and teardown place (where we can start and stop 
Farsandra or another server), there's a bad news: I had to create our specific 
TestSuite (CumulusTestSuite and CumulusWebTestSuite). That means two important 
things for all of us (devs):

    you cannot use the Run As --> JUnit tests because that won't start Farsandra (or the embeeded server of your profile)
    you must use Run As --> Maven test on the whole suite and you need to specify a valid profile (which indicates what kind of storage you want to use)
    Suites indicates test classes as annotation (SuiteTest) so each time we add a new Test class we have to add that class to suite.

NOTE: this is not really true, there's a way to run a single test within 
Eclipse and JUnit, but some (easy) manual step is required. I will send you all 
the details

Well, I don't want to bore you further, so please let me know what you think. 
There's only a missing todo: the configuration of each specific storage module. 
it's not blocking and I'm working on it.

@Andreas: could you please re-activate the automatic build on 1.1.0?

@Sebastian: can I start to integrate what you did in 2.x in 
cumulusrdf-pluggable-storage-cassandra2x-cql-full-tp-index? Is that moreless 
stable? I mean, if you have to do some minor changes, we can do some svn-merge 
further

Best,

Andrea

Original comment by a.gazzarini@gmail.com on 4 May 2014 at 4:13

GoogleCodeExporter commented 9 years ago

Guys, I think this issue can be considered fixed. 
Just a brief riepilogue:

- we have a CumulusRDF core that is expecting a modular storage be plugged in 
at runtime.
- at the moment (1.1.0) we have support for Cassandra 1.2.x (1.2.16) via Hector 
and Cassandra 2.x (2.0.6) via CQL
- there are two "framework" projects, the first mainly used for extending 
CumulusRDF with additional storages, the second for having a centralized 
infrastructure for tests (e.g. Farsandra runner) 

AG

Original comment by a.gazzarini@gmail.com on 12 May 2014 at 4:56

Changed state: Fixed

hich9n / cumulusrdf

Modular storage layout #57