Open Nithyavasudevan opened 10 years ago
Difficulties with CTPEC on Name/Title/Descriptions all being used for names/labels, probably involved here.
It looks like this is we just have some bad test data here, it looks like the dataset wasn't cleared out before this new one was collected.
If you look at the dataset name (in the second screenshot) it ends in ba92 which would be the final four characters of the dataset id.
In the first screenshot it there is not 'ba92' anywhere in the dataset id, this means the two screenshots are completly unrelated. Now that we have repointed to UAT it will be worth retesting if this is still a problem
@Nithyavasudevan can you re-test this, once we have clean data (post MS deploy, post our reset)
will do. what was the actual fix done please? So u understand what to test and also impact?
we will discuss this on the call
Note. Not bad data. Note the file name in first and second screenshot
OK, this was more complex than expected and the potential fix might not be entirely satisfying. The way custom fields in resources are handled in CKAN, any field not part of the main schema will be added to the main object before creating / updating, eg:
{
"Title": "Some file",
"Quality": 1,
"Metadata": {
"custom_field": "custom value"
}
...
}
Gets translated into this CKAN equivalent:
{
"name": "Some file",
"quality": 1,
"custom_field": "custom value",
...
}
Note that the platform field Title
is named name
in CKAN. If a custom field is using this key, it will incorrectly replace the Title
one:
{
"Title": "Some file",
"Quality": 1,
"Metadata": {
"name": "custom value"
}
...
}
will turn into:
{
"name": "custom value",
"quality": 1,
...
}
We can prevent custom fields with reserved keywords from being imported (eg ignore any custom field with a name
key). This is a good idea anyway to prevent changing fields like the url, type, etc via custom fields, but it will mean that right now on some edge cases datasets in the platform and on CKAN have different custom fields. This is the only compromise we can apply at this point.
@amercader when you say prevent it from being imported, do you mean stop user from creating these on ckan?
So this happens at ALL levels for custom metadata. Everything in the hierarchy (as it may be a serialized lightweight object rather than a single key/value pair) is normalized to a scalar key/value dictionary list??
Even with a basic approach to using namespaces "custommetadata.name" which is a 5 minute job, it seems this will not work as everything is normalized to a single key/value list. Very limiting indeed if the case. You'd need to convert to "level.custommetadata.name" as the key to get near to this working well.
My question is, you can't use "name" in the custom metadata as it clashes with the core metadata - which i get.
However, within the metadata, if i use the key "testkey" for two different objects serialized in the JSON, will i only get the value for one of them? e.g.
{ "title":"steven", "Metadata": {
"source": {
"email":"a@b.com",
"phone":"999"
},
"approver": {
"email":"c@d.com",
"phone":"111"
}
}
}
So this happens at ALL levels for custom metadata. Everything in the hierarchy (as it may be a serialized lightweight object rather than a single key/value pair) is normalized to a scalar key/value dictionary list??
However, within the metadata, if i use the key "testkey" for two different objects serialized in the JSON, will i only get the value for one of them? e.g.
I'm afraid I don't know what you mean in both your comments
Custom field values are serialized as a string so if you input a JSON object it will get serialized as such:
"title": "steven",
"approver": "{\"email\":\"a@b.com\", \"phone\":\"999\"}",
"source": "{\"email\":\"a@b.com\", \"phone\":\"999\"}",
...
See eg:
https://ckanfrontend.cloudapp.net/dataset/dataset-uat-9-9
Even with a basic approach to using namespaces "custommetadata.name" which is a 5 minute job, it seems this will not work as everything is normalized to a single key/value list. Very limiting indeed if the case. You'd need to convert to "level.custommetadata.name" as the key to get near to this working well.
As I mentioned before this can be done, but it is not a 5 minute job. I don't think we would need the level.custommetadata.name
approach at all, because as I mentioned all values get serialized as strings.
Glad to hear it is serialized as a string - this is what I was hoping.
I was actually using your example where you said "any field not part of the main schema will be added to the main object before creating / updating" which doesn't sound technically true given your statement above says "Custom field values are serialized as a string".
You said: { "Title": "Some file", "Quality": 1, "Metadata": { "name": "custom value" } ... }
goes to
{ "name": "custom value", "quality": 1, ... }
Which could then mean by extension, if everything is stored as a single dictionary that :
{ "Title": "Some file", "Quality": 1, "Metadata": { "name": "custom value", "moredata":{ "name":"other custom value" } } ... }
would go to something like:
{ "name": "custom value", <- or "other custom value" ?? "quality": 1, ... }
On your example:
{
"Title": "Some file",
"Quality": 1,
"Metadata": {
"name": "custom value",
"moredata":{
"name":"other custom value"
}
}
will get transformed into:
{
"name": "custom value",
"quality": 1,
"moredata": "{\"name\": \"other custom value\"}"
...
}
We obviously need to prevent "name": "custom value",
(which is what this issue was originally about), but the name
keys inside nested dicts in values should not be a problem.
Lovely - that looks great Adria. Thanks. @Nithyavasudevan - will we need to just check this scenario but is what i was hoping :+1:
@Nithyavasudevan, @stevenlivz As agreed, these two features are ready for testing:
name
or quality
) (see image below):star2:
Reserved words are here https://gist.github.com/amercader/ef1d24fc63ad4e308277
Based on our agreement that this list is ok, we can close this. This list will need to be documented in tfs for future reference..
The error messages still need checked. I created a dataset with custom metadata fields save, before, _before and private. But the error message i got was Extras: There is a schema field with the same name
Can we tidy up the error message that is presented to the user in this situation pls? Something like:
Sorry you cannot use key
.....whatever fits in with the standard lingo used for other user errors in CKAN as long as it's a bit more meaningful than current error.
IS anyone looking at tidying up the error message on this to make it a bit more user friendly?
A value given for a custom metadata key field is displaying as the title for a file. This file was created directly on the api. Title is stored correctly in the title field and "name" is stored as a key with value of "bond"
Below is the screenshot of the data created using test tool directly on platform
Note the file getting listed incorrectly within dataset
Note that no custom metadata is listed. Checked all these values in the db on the platform. They seem to be stored correctly.