Open GoogleCodeExporter opened 9 years ago
Hi Andreas,
what do you think about using JMX for those (and other similar) purposes?
Andrea
Original comment by a.gazzarini@gmail.com
on 25 Jan 2014 at 8:59
Hi Andrea,
thanks for looking into this.
I probably should have may this issue munch more specific. In our current
implementation we only have heuristic-based selectivity estimation [1]. This
implementation mainly based on [2] and takes some ideas from the paper in [3].
Unfortunately, our SPARQL performance is not "too good" - as pointed out by our
recent benchmark [4]. So, one way to improve this would be to create better
query plans via a more accurate selectivity estimation.
In fact, a colleague of mine supervised a master thesis on this topic, where
the student implemented a much better estimation for cumulusRDF. However, this
code is completely untested and done by a student ;) So ... one would have to
spend some time on it.
In fact, the actual problem is: how to efficiently create meaningful triple
pattern (or even join pattern) statistics via Cassandra. There also have been
some posts on the cassandra mailing list about this, e.g., [5].
Overall, this is not a trivial problem - however, I think we should target it
as a longterm goal/issue.
Kind regards
Andreas
[1] edu.kit.aifb.cumulus.store.sel.HeuristicsBasedSelectivityEstimator
[2] org.openrdf.query.algebra.evaluation.impl.EvaluationStatistics
[3] Heuristics-based Query Optimisation for SPARQL
[4] NoSQL Databases for RDF: An Empirical Evaluation
[5]
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tracking-word-f
requencies-td7592285.html
Original comment by andreas.josef.wagner
on 26 Jan 2014 at 1:33
Original issue reported on code.google.com by
andreas.josef.wagner
on 22 Nov 2013 at 12:21