openspending / fiscal-data-package

MOVED TO https://github.com/frictionlessdata/specs/issues?q=is%3Aopen+is%3Aissue+label%3A%22Fiscal+Data+Package%22
24 stars 7 forks source link

Remove Fact as a dimension #110

Closed pwalsh closed 8 years ago

pwalsh commented 8 years ago

Seeing Fact tables mentioned here reminded me that the spec currently has this (vague to me) idea of a "fact dimension".

We are not using this dimension in any of our existing Fiscal Data Packages, and, fact does not make sense to me as a dimension.

Why do we have it? What problem does it solve?

The dimensions and measures should make up the "fact table" in some database that takes data from FDP. The only case I can see for why it has been added is to capture all the fields in a dataset that are not captured by a measure or a(nother) dimension. To me, this does not make a lot of sense. If a column in source data cannot be mapped to a specific concept, then that is fine - data sources might have all sorts of stuff that is not mappable to our fiscal cube.

danfowler commented 8 years ago

I am in favor of removing fact as well as other.

@rgrp

pwalsh commented 8 years ago

@danfowler great to hear, I think removing from spec is best.

About openspending_id. These are auto generated uuids. I'd say they have no value whatsoever in the migration forward, except if they are exposed as meaningful via the aggregate APIs ( @akariv ? ). If we do have to migrate forward, it opens up issues related to https://github.com/openspending/fiscal-data-package/issues/109, which, while your focus there is on PKs for dimensions, the same logic also applies for the budget line itself. I won't dive into details here but current OpenSpending has no way to update a budget line, and a deterministic way to identify a line (which uuid does not give) is crucial to allowing updates.

rufuspollock commented 8 years ago

@pwalsh how do you express where attributes which do not go into a full dimension go? The idea was that a fact dimension contained other than measures which went into the fact table.

I'm happy to see it go for the present and we bring it back when we see the need.

pudo commented 8 years ago

@rgrp that's a weird special-casing we had in OS, but I've never ever seen it anywhere else. Doesn't it just mean we're being sloppy in modelling the data?

pwalsh commented 8 years ago

@rgrp what is an example of such an attribute? As far as I see, if some data doesn't fit into some known dimension, then it doesn't get modelled, and that is ok. We could have a physical model with all sorts of columns that are not necessarily "mappable" in our mapping.

rufuspollock commented 8 years ago

@pwalsh it is fairly common to have "degenerate" dimensions that do not naturally become a full dimension in their own right and end up on the fact table (aside: have you read the Kimball and Ross book on OLAP - i found it invaluable).

We may be going a bit too OLAP here and I also should try and find a proper example -- for the time being i'm happy to see this go :-)

pwalsh commented 8 years ago

@rgrp yes I've read (parts of) the book :). Still, let's remove this then, but some examples would be good to see how/why we need this in Fiscal Data Package.

pwalsh commented 8 years ago

@danfowler so can you update the spec based on this discussion.

rufuspollock commented 8 years ago

AGREED. Remove and we can re-add if/when we have a clear example.

danfowler commented 8 years ago

Fixed in #130 (though should have been broken out into separate PR)