Open djvanderlaan opened 3 weeks ago
Thanks for these excellent points! I've updated the table-based category examples to the new pattern we discussed in the meeting today, that is, resource with the following additional properties:
{
type: "category-table",
categoriesOrdered"?: boolean (default = undefined),
valueField?: string (default = "value"),
labelField?: string (default = "label")
}
Then in the categorical field being defined,
{
categories: {
package?: string,
resource: string
}
}
(Where optional property package
uses the same rules as external foreign keys: https://datapackage.org/recipes/external-foreign-keys/)
For references within a package, do we want to simplify the categories field definition to just take a string? Then we could have fields like this:
{
"name": "fieldname",
"type": "string",
"categories": "category-table-resource-name"
}
An object would only be necessary when we wanted to specify a category-table in an external package...
That would be shorter and most of the times the resource will be in the same package as the dataset; however the categories
property might be getting a bit overloaded. It can then be
"categories": ["apple", "pear","orange"]
"categories": [ {"value": 1, "label":"apple"}, ...]
"categories":{ "resource": "fruittypes"}
"categories": "fruittypes"
And as someone working in R: it is tricky to distinguish options 1 and 4. I would personally prefer sticking with just option 3 (besides 1 and 2); option 4 saves us just 14 characters and adds to the complexity.
https://github.com/khusmann/frictionless-categorical-examples/tree/main/table-based-categoricals#some-theoretical-questions
I believe the default for 'valueField' should be 'value' as it is in the current categories specification.
Points 5 and 6
I also ran into this. I already had an implementation based on an earlier proposal. There the fields used for the categories were "code" and "label", while the current implementation uses "value" and "label". When changing the code lists to use "value" instead of "code" I ran into the issue that I had to track down all fields referencing these code lists and changing the "valueLabel" field there. I would have been easier and more self-contained if the data resource for the code lists itself would indicate which field contains the values and which the labels. However, this would mean extending not just the 'categories' property of the field descriptor but also the definition of Data Resource. Personally, I am in favour of this.
Additional point: referencing external data packages
It would be nice if we could not just reference a Data Resource in the current Data Package but also a Data Resource in another Data Package. So:
This would allow referencing central catalogues of category lists. The disadvantage is that this makes the data package less self-contained and might lead to problems with link rot.