Implement CSV export - Githubissues

philboeselager commented 9 years ago

task for #35

philboeselager commented 9 years ago

Starting implementation, some detail questions appear:

Should all fields be exported, including @type and @context?
Deep structures should be exported with full path column names, shouldn’t they?
If "yes" on 2., then a path separator is necessary for deep structures. The more special this separator is, the higher the safety of avoiding conflicts with appearance of this separator in the exported data's properties names. According to the widely accepted Google Json naming conventions http://google-styleguide.googlecode.com/svn/trunk/jsoncstyleguide.xml?showone=Property_Name_Format#Property_Name_Format , special chars like the following (amongst others) should not be used in Json properties: §%/*~#-|> Personally, I'd go for the intuitive >, or something similar (>>, -> etc.).

acka47 commented 9 years ago

Should all fields be exported, including @type and @context?

I'd say @type is relevant, @context not for csv.

BTW, the W3C candidate recommendations for tabular data on the web might be of interest here...

philboeselager commented 9 years ago

A bigger bang results from retrofitting of columns on previously export entries. Given:

Person 1:

{
  "name" : [ {
    "@value" : "Annie Doe"
  } ]
}

... Person n:

{
  "name" : [ {
    "@value" : "John Mae"
  } ,
  {
    "@value" : "John Doe"
  } ]
}

Assuming, we export full path column names, these could be: name>0>@value and name>1>@value. The longer name array of Person n would imply that all of the previous export lines would have to be retrofitted (and given the additional column).

Now this could end up a very badly scaling algorithm. My suggestion is to "scan" the complete export data before actually exporting it, so that a missing / empty columns can immediately be "retrofitted" (by the insertion of a ;).

literarymachine commented 9 years ago

I'd say @type is relevant, @context not for csv.

+1

literarymachine commented 9 years ago

Deep structures should be exported with full path column names, shouldn’t they?

Maybe we should go with something like this, then we would have a fixed numer of columns:

ID, type, name, address, provides
{urn:uuid}, "Name A, Name B", "Some street, some locality, some country", {urn:uuid}

philboeselager commented 9 years ago

Looks like there are 3 possible strategies:

1. Just boxing deep information into one column.

Advantages:

fixed column number -> easy column handling
only 1 export necessary

Disadvantages:

complicated with more complex and diverse data, as if there was, see example 1 -> very different cell content possible for within one column
redundant data

Example [1] :

{
  "authorOf" : [ {
    "name" : "My Book Nr. 1"
  },
  {
    "name" : "My Book Nr. 2",
    "mentions" : [ {
      "name" : "Mentioned Item 1"
    },
    {
      "name" : "Mentioned Item 2"
    } ]
  }
}

2. Only export IDs of nested objects. Gather multiple ID's in one cell for arrays. (like described by @literarymachine directly above)

Advantages:

fixed column number -> easy column handling
redundant data avoidance
quite homogeneous cell content

Disadvantages:

multiple exports necessary to get nested information (Exporting multiple types in one go makes no sense due to different columns per type.)

It is to be defined here:

should nested data exports automatically be triggered? (Needs a possibility to save multiple CSV files in one folder.)
and if so, what is the maximum number of sub levels to be exported? (1 full sublevel export seems natural, in parallel to the embedded and linked Resource view)

3. Export one column for each sub field" like described in my comment, dated Oct. 7th, 10:36 above.

Advantages:

homogeneous cell content
only one export necessary to get nested information

Disadvantages:

redundant data
variable column number -> complicated column handling

Though I would probably agree to go for Felix' suggestion, we should very precisely think about, what's the use case of data exports:

How do users create exports? My idea is that they submit a query and export the provided query result. Anyway, data to be exported is a filtered subset of our entire data.
Do users want to make statistical analysis and/or create graphics? That would possibly favour the third strategy, because data is provided with better detail.
Do users want to have a simple survey of OER World Map data? That would favour strategies 1 and 2.
Do people want to feed their own databases? That would presumably favour strategy number 2.

Just let us make sure we don't build anything that the community is not going to need.

literarymachine commented 9 years ago

(Exporting multiple types in one go makes no sense due to different columns per type.)

I was thinking about this too and would argue that we actually could add all columns and only populate those that are appropriate. At least for a first take. Or we could say that we only supply exports by type for now.

we should very precisely think about, what's the use case of data exports

ping @trugwaldsaenger!

philboeselager commented 9 years ago

Talked to @trugwaldsaenger today. Due to missing knowledge about use cases, we tend to offer diverse CSV export variants. So, having realised a "variant 3", I would next try to set up a "variant 2" exporter.

literarymachine commented 9 years ago

Sounds good. But even better would be to specify some use cases for the exports, no?

philboeselager commented 9 years ago

For sure.

philboeselager commented 8 years ago

The two versions implemented so far are branched in https://github.com/hbz/oerworldmap/tree/task/%23398_variant2 and https://github.com/hbz/oerworldmap/tree/task/%23398_variant3 . @literarymachine : I'm going to rename one of the classes and merge the two branches, OK?

literarymachine commented 8 years ago

I'm going to rename one of the classes and merge the two branches, OK?

Yes, please!

hbz / oerworldmap

Implement CSV export #398