jjmontesl / cubesviewer

Explore and visualize analytical datasets
http://www.cubesviewer.com
Other
441 stars 112 forks source link

Consider support for Babbage? #70

Closed pwalsh closed 8 years ago

pwalsh commented 8 years ago

The new work on CubesViewer is super impressive. I'm wondering if you would at all be interesting in adding support for Babbage: https://github.com/openspending/babbage

Babbage is a port of the Cubes API, with some small changes, and less complexity as it does not support multiple backends (Babbage only runs on an SQL backend).

Babbage was originally written by @pudo to power fiscal data analysis, and since, it is being used in various cases by SpenDB in various locations throughout Germany, Municipal Data in South Africa, and it also forms the core analytical API of OpenSpending, a generic platform for fiscal data. We have a several regular contributors ( @pudo @akariv @longhotsummer @jbothma ) and some more occasional contributors.

We'd be delighted if this would be of interest.

jjmontesl commented 8 years ago

I have been reading through these projects and testing http://labs.openspending.org/babbage.ui and https://spendb-dev.herokuapp.com.

Supporting more backends would definitely be interesting although at this time I'm currently focused on client side features. I'd like that other backends could be reached through Cubes, or at worst, move to MDX to support a wider range of tools, as oppossed to maintaining support for two different proprietary backend protocols.

In any case, abstracting from the backend is in my interest but CubesViewer hasn't got there yet. It's internals are (as of today) tightly coupled to Cubes model, and I could not tackle this in the short term anyway.

I don't quite understand how Babbage is different from Cubes (why the fork). Seems to me that anything that can be modelled and mapped in Babbage could be modelled in Cubes?

After my first approach to this, I'd favour any of these paths:

  1. drop Babbage and move Babbage servers to Cubes
  2. if that is not possible, create a Babbage backend for Cubes
  3. support some open analytical query standard like MDX in all Babbage, Cubes and CubesViewer

What do you think about these options? I'm sure you have other pov on the topic.

On the other hand, CubesViewer use cases (including view sharing, embedding and other topics) overlap in many areas with what you are building, so I feel it'd be nice to discuss or learn more and see how we are aligned and how to not duplicate efforts.

I have similar projects: ETL tools and stuff built to import Spanish and Eurostat data. It'd be nice to talk about that at some point.

Stiivi commented 8 years ago

I agree with @jjmontesl. The abstraction of cubing engines should be done through Cubes, otherwise CubesViewer will be replicating the functionality on presentation layer. The best solution with the least effort would be to create a Babbage backend for cubes.

pwalsh commented 8 years ago

@jjmontesl @Stiivi thanks for your thoughts. From each of your comments, it comes down to two things: why the fork, and, why not create a Babbage backend for Cubes. @pudo is the best placed person to answer the first, and also probably the second.

pudo commented 8 years ago

ha! So this is coming back to haunt me :) When I started building babbage.ui (initially: angular-cubes) it turned out that cubes was full of bugs (unquoted SQL parameters are fun!) and limitations that prevented me from generating the types of reports I wanted to present.

Trying to hack it proved difficult due to the multi-backend support and due to the fact the project (and 1.1 in particular were pretty much dead at the time). The whole thing just seemed like an abandoned building site.

So I decided it would actually be easier to make a small rip-off that implemented the web API against SQL only, but do so with high test coverage and clean code. This ended up as babbage, in which I tried to "progressively enhance" the API...

jjmontesl commented 8 years ago

Ouch... Granted Cubes master needs stability, and Stefan has kindly put hands on. From there on it we shall maintain a stable versioning policy.

And how far is it from Cubes master as of today? Is a "join" still possible? what kind of queries or features did you have to add? (perhaps those are of use to Cubes too).

Otherwise, seems a backend adaptor Cubes->Babbage shall perhaps be quite straightforward (this is, for those knowing the ins and outs of Cubes backends).

To me it would be a relief, and it'd be nice for the ecosystem.

@pudo regarding angular, your library is greatly aimed in that components are reusable. I'd like to have done it the same but the migration was large and I needed to have something working first. But provide directives for CubesViewer components, that'd be great, and I guess you guys could reutilize CubesViewer much better if directives were (more) isolated from views?

If you are still interested I could push such a refactor to the top of my to-do. Then you could for perhaps for a start contribute back your own viewmodes (treemap, flow...).

pudo commented 8 years ago

Yeah, that came out hard, didn't mean to insult. If so, my apologies.

In any case, babbage is not based on cubes in more than adhering to the API specification, it's not actually a fork. As for a join, I think it's up to the current active babbage developers (@pwalsh, @akariv, @jbothma) to decide if they want to migrate their apps over to the cubes API.

As for the components: that sounds absolutely great. Still, I'd like to see what actually happens if you point cubesviewer as-is against a babbage API endpoint. Might try that this weekend :)

jjmontesl commented 8 years ago

Me too. Is there any public babbage endpoint?

And who's to build that backend if finally needed? I'd love to see CubesViewer in use on your side, but my knowledge of Babbage and Cubes backends is poor and I'd rather keep working on the client).

jbothma commented 8 years ago

Here's one public endpoint - note that this isn't officially released yet but it's fine for testing purposes http://data.municipalmoney.org.za/api and explanation of things at https://data.municipalmoney.org.za/

Our reasoning for going with babbage is that @pudo's reasoning seemed sound and accurate at the time. I'm not really familiar with cubes - has maintenance picked up again there? Part of that was that we didn't know at the start of the project how much we might need to customise whatever platform we build on to build what we need. That would have been much harder with cubes. So far we've mainly tightened up error feedback and added basic star table support. Luckily this wasn't too difficult with babbage's simplicity but ofc cubes has it already.

The main incompatibility I can think of is quoting string cut values which babbage seemed to add and we needed in babbage for string codes that look like integers. Not sure if cubes magically handles that without the quotes. We'd be reluctant to change from the quotes after launching in a few weeks, but if need be, if we move to cubes, a translation layer back to cubes could be done.

In the long run we want a stable maintainable and secure platform and what's best for the community. If moving to cubes isn't very difficult and that's where ongoing development is happening, that's where we'll want to be. Right know we're satisfied with babbage so moving only makes sense if cubes is alive and kicking and we don't need to make important changes in the short term after launching our API.

jjmontesl commented 8 years ago

Regarding Cubes, difficult decision, which only you can take. Depending more or less on another project has its pros and cons, context...

On the other hand, I have tested CubesViewer on that endpoint and it does not work. Even after hacking initial connection (Babbage has no /infomethod) the format of the cube list is pretty different.

If needed I'd be up for a live meeting.

jjmontesl commented 8 years ago

Another option is to alter the cubes.js and cubes-cvextensions.js in CubesViewer, to map cubes client api to babbage. They are not expected to change anytime soon. But doesn't seem too elegant, but maybe as a quick hack...

jjmontesl commented 8 years ago

Well I managed to get the cube list but model definition doesn't work and it would require some serious amount of work. Yet, I don't think this is the way to go.

seleccion_030

jjmontesl commented 8 years ago

For reference:

http://cubesdemo.cubesviewer.com/cubes https://data.municipalmoney.org.za/api/cubes

http://cubesdemo.cubesviewer.com/cube/webshop_sales/aggregate https://data.municipalmoney.org.za/api/cubes/bsheet/aggregate

pwalsh commented 8 years ago

@jjmontesl @Stiivi from the perspective of OpenSpending, we went into taking on Babbage because it worked, had great coverage and is a fairly small/simple codebase. Of course, we trust @pudo 's judgement which was also a major factor.

Our team has at most read parts of the Cubes codebase, and it has several degrees of additional complexity due to the flexible backend support, which is great for a well-maintained codebase, but a bit more risky without an active maintainer.

Surely we'd consider ways we could move the analytical portions of the new OpenSpending API, supported by Babbage, to Cubes, but right now, we probably do not have the capacity to do it while we are in quite heavy feature development.

It is also unclear to me right now how much "progressive enhancement" that was added moves away from strict OLAP patterns in Cubes (and therefore might not be applicable for the Cubes core).

@jjmontesl I've commented on #73 about how we approached componentizing the Babbage.UI codebase: very happy to talk with you at any time if it helps.

Stiivi commented 8 years ago

Here is a little bit of history of Cubes and Open Spending…

There used to be a project called Where Does My Money Go which was about to become much larger and generalised Open Spending or at least used as inspiration. The project used a MongoDB storage where some kind of relational model was unintentionally “reinvented” within few collections. (Btw. the very first mongo backend for Cubes was just for that purpose.). The data was in a very bad shape for analytics in so many ways, but it was huge advancement just to have them and be able to publish – which was the point.

At that time I was helping with education about OLAP and data warehousing in general, as this knowledge was close to non-existent in the open-data communities back then. We had endless discussions about the concept of cubes, about how Cubes (the library) solves this problem and about how to apply it to the Open Spending project.

If I recall correctly, it was during the Open Knowledge conference in Berlin in 2011 and I was still in some sort of cooperation with OKFN. We had a meeting where I proposed either adding a loading of the system to a relational model:

wdmmg-mongo-relational

or to replace the whole ETL process:

openspending-etl

There is even remnant of those times in Cubes documentation if you look carefully at the to_entityindustrycompany – it reflects the original objects in the Mongo collection.

For some reason, there was huge resistance for both proposals – they were turned down. Suggested ETL deemed mostly unnecessary and as over-complication, original structures mostly meant to be preserved and the only way to provide data to the end-application was somehow directly. Or something very similar to that architecture. The data warehousing knowledge was later considered very low priority, as the goal was to “deliver features”, no matter how. Given that approach and state of the system it was impossible to put any OLAP layer on top of it, despite Cubes worked pretty well and was already powering one Open-Data/Open Government project – Open Procurements of Slovakia.

That was major contributor to my decision to listen to the open-data community less and follow actual business needs for the project.

Now back to the future...

I don’t feel happy about the complaint “a bit more risky without an active maintainer” in this particular case. While I sympathise with the concern and would have similar myself, I don't think it has place given the history.

Cubes is an open-source project that never got funding and the only significant corporate contributor was Squarespace and its developers, mostly Robin Thomas (@robin900) with great ideas and lots of code. It got contributions for various developers using the project here and there, however, the only developer was me and I was "paying it" from the consulting work I was doing. It has bugs, it does not have proper coverage and I would definitely write it differently these days. It was a work of a single person, not a team, doing it in his free time.

Currently I have to focus on the business needs of the company I work with, which are not directed towards Cubes at this time. Despite that, I will resume work on the project as I still have a lots of plans all over my notebooks. Also CubesViewer is another big reason for me to revive the project, as it has a huge potential as very unique open-source metadata driven OLAP visualisation and data exploration tool.

CC: @rgrp

pwalsh commented 8 years ago

Hi @Stiivi you seem to have taken offence at my phrase "risky without an active maintainer", and it looks like that is related to a much longer history that I cannot change: while I do know of the relationship, I did not know the specific details as you outline above. Most of us here are a generation (or two) after that time.

This thread is now going in quite a different direction to the original intention. For this new direction, I'd be more than happy to talk with the relevant parties, but perhaps not here. A first step for that would be to get a pretty exact understanding of the how Babbage diverges from Cubes ( @pudo is there any chance you can help a little there, over the coming month, if I can take an hour or so of your time? ).

longhotsummer commented 8 years ago

From the sidelines: using Babbage with @jbothma has shown me how important an open source OLAP implementation that is easy to work with (ie. not XML) is. It has saved us an enormous amount of time and we'll probably use it again on future projects. It's an important addition to the open data ecosystem. Whether it's Babbage or Cubes, keeping this going is important :)

jjmontesl commented 8 years ago

Stephan's sentence "while I sympathise with the concern and would have similar myself" summarizes the thing quite well: taking decisions about a technology stack involving not fully mature projects is a real challenge, with no best solution. In the ideal world we wouldn't have divergence in these projects as we all want to see our projects united and growing faster, but... :)

Above all, I'm very keen in seeing CubesViewer working with your OS data. Please count on me for making this happen, but we need a plan. Feel free to engage me through Hangouts (jjmontes@gmail.com) or Skype.