larray-project / larray

N-dimensional labelled arrays in Python
https://larray.readthedocs.io/
GNU General Public License v3.0
8 stars 6 forks source link

Only use data from Eurostat in the tutorial? #770

Open alixdamman opened 5 years ago

alixdamman commented 5 years ago

Using data from Eurostat has some advantages:

Currently, the tutorial uses data from internal source and from Eurostat. The idea is homogenize the Tutorial by using data from one official source.

gdementen commented 5 years ago

I am -0 on this. Why not, but I don't see any inherent value in doing so. The goal is to have a dataset that a) is understandable b) users can play with. What we provide satisfies both criteria, so I don't think it is the best usage of our time to change what we currently have. Eurostat datasets are usually good datasets but not inherently better than the real dataset we ship.

Also, do you mean to use eurostat_get in the tutorial or to use data stored in the package but which was initially from eurostat? I am -1 on systematically using eurostat_get because it take some boring code to get a usable dataset (strip useless dimensions etc.).

alixdamman commented 5 years ago

do you mean to use eurostat_get in the tutorial or to use data stored in the package but which was initially from eurostat?

I wasn't explicit on this my first comment, sorry. My idea is to use data stored in the package but which was initially from eurostat and not using eurostat_get directly in the tutorial.

Eurostat datasets are usually good datasets but not inherently better than the real dataset we ship.

I disagree. Currently, the "real dataset" we ship is only from one team and represents demographic data. As for datasets from Eurostat, we have to extract subsets of them to make them usable in the tutorial. If and I say if we can have access to datasets from other teams one day, I am pretty sure those will not be consistent between them (one team will use an axis time with years written as Y2015 ... Y2020 and the other the same axis time but with years written as 2015 ... 2020).

Datasets from Eurostat have the great advantage to use common labels and definitions. If we have to rename axes and/or labels, it will be easier with data from Eurostat.

One other thing with the "real data" we ship, I never know which are public and which are not. With data from Eurostat, you don't have to worry about that.

gdementen commented 5 years ago

I fail to see why this is urgent enough to warrant the 0.31 milestone...

alixdamman commented 5 years ago

OK, I will change to 0.33