okfn / ckanext-glasgow

CKAN Extensions specific to Open Glasgow
GNU Affero General Public License v3.0
0 stars 1 forks source link

custom metadata value being displayed instead of title for file #156

Open Nithyavasudevan opened 10 years ago

Nithyavasudevan commented 10 years ago

A value given for a custom metadata key field is displaying as the title for a file. This file was created directly on the api. Title is stored correctly in the title field and "name" is stored as a key with value of "bond"

Below is the screenshot of the data created using test tool directly on platform image

Note the file getting listed incorrectly within dataset

image

Note that no custom metadata is listed. Checked all these values in the db on the platform. They seem to be stored correctly.

adamamyl commented 10 years ago

Difficulties with CTPEC on Name/Title/Descriptions all being used for names/labels, probably involved here.

joetsoi commented 10 years ago

It looks like this is we just have some bad test data here, it looks like the dataset wasn't cleared out before this new one was collected.

If you look at the dataset name (in the second screenshot) it ends in ba92 which would be the final four characters of the dataset id.

In the first screenshot it there is not 'ba92' anywhere in the dataset id, this means the two screenshots are completly unrelated. Now that we have repointed to UAT it will be worth retesting if this is still a problem

adamamyl commented 10 years ago

@Nithyavasudevan can you re-test this, once we have clean data (post MS deploy, post our reset)

Nithyavasudevan commented 10 years ago

will do. what was the actual fix done please? So u understand what to test and also impact?

adamamyl commented 10 years ago

We think it was a case of dirty data.

Nithyavasudevan commented 10 years ago

we will discuss this on the call

Nithyavasudevan commented 10 years ago

Note. Not bad data. Note the file name in first and second screenshot image image

amercader commented 10 years ago

OK, this was more complex than expected and the potential fix might not be entirely satisfying. The way custom fields in resources are handled in CKAN, any field not part of the main schema will be added to the main object before creating / updating, eg:

{
    "Title": "Some file",
    "Quality": 1,
    "Metadata": {
        "custom_field": "custom value"
    }
   ...
}

Gets translated into this CKAN equivalent:

{
"name": "Some file",
"quality": 1,
"custom_field": "custom value",
...
}

Note that the platform field Title is named name in CKAN. If a custom field is using this key, it will incorrectly replace the Title one:

{
    "Title": "Some file",
    "Quality": 1,
    "Metadata": {
        "name": "custom value"
    }
   ...
}

will turn into:

{
"name": "custom value",
"quality": 1,
...
}

We can prevent custom fields with reserved keywords from being imported (eg ignore any custom field with a name key). This is a good idea anyway to prevent changing fields like the url, type, etc via custom fields, but it will mean that right now on some edge cases datasets in the platform and on CKAN have different custom fields. This is the only compromise we can apply at this point.

Nithyavasudevan commented 10 years ago

@amercader when you say prevent it from being imported, do you mean stop user from creating these on ckan?

stevenlivz commented 10 years ago

So this happens at ALL levels for custom metadata. Everything in the hierarchy (as it may be a serialized lightweight object rather than a single key/value pair) is normalized to a scalar key/value dictionary list??

Even with a basic approach to using namespaces "custommetadata.name" which is a 5 minute job, it seems this will not work as everything is normalized to a single key/value list. Very limiting indeed if the case. You'd need to convert to "level.custommetadata.name" as the key to get near to this working well.

stevenlivz commented 10 years ago

My question is, you can't use "name" in the custom metadata as it clashes with the core metadata - which i get.

However, within the metadata, if i use the key "testkey" for two different objects serialized in the JSON, will i only get the value for one of them? e.g.

{ "title":"steven", "Metadata": {

   "source": {
    "email":"a@b.com",
    "phone":"999"
    },
   "approver": {
    "email":"c@d.com",
    "phone":"111"
    }
}

}

amercader commented 10 years ago

So this happens at ALL levels for custom metadata. Everything in the hierarchy (as it may be a serialized lightweight object rather than a single key/value pair) is normalized to a scalar key/value dictionary list??

However, within the metadata, if i use the key "testkey" for two different objects serialized in the JSON, will i only get the value for one of them? e.g.

I'm afraid I don't know what you mean in both your comments

Custom field values are serialized as a string so if you input a JSON object it will get serialized as such:

"title": "steven",
"approver": "{\"email\":\"a@b.com\", \"phone\":\"999\"}",
"source": "{\"email\":\"a@b.com\", \"phone\":\"999\"}",
...

See eg:

https://ckanfrontend.cloudapp.net/dataset/dataset-uat-9-9

https://ckanfrontend.cloudapp.net/dataset/dataset-uat-9-9/resource/82e0f207-3dc0-4600-8bdf-33c9157fcf79

Even with a basic approach to using namespaces "custommetadata.name" which is a 5 minute job, it seems this will not work as everything is normalized to a single key/value list. Very limiting indeed if the case. You'd need to convert to "level.custommetadata.name" as the key to get near to this working well.

As I mentioned before this can be done, but it is not a 5 minute job. I don't think we would need the level.custommetadata.name approach at all, because as I mentioned all values get serialized as strings.

stevenlivz commented 10 years ago

Glad to hear it is serialized as a string - this is what I was hoping.

I was actually using your example where you said "any field not part of the main schema will be added to the main object before creating / updating" which doesn't sound technically true given your statement above says "Custom field values are serialized as a string".

You said: { "Title": "Some file", "Quality": 1, "Metadata": { "name": "custom value" } ... }

goes to

{ "name": "custom value", "quality": 1, ... }

Which could then mean by extension, if everything is stored as a single dictionary that :

{ "Title": "Some file", "Quality": 1, "Metadata": { "name": "custom value", "moredata":{ "name":"other custom value" } } ... }

would go to something like:

{ "name": "custom value", <- or "other custom value" ?? "quality": 1, ... }

amercader commented 10 years ago

On your example:

{
"Title": "Some file",
"Quality": 1,
"Metadata": {
    "name": "custom value",
    "moredata":{
        "name":"other custom value"
    }
}

will get transformed into:

{
"name": "custom value",
"quality": 1,
"moredata": "{\"name\": \"other custom value\"}"
...
}

We obviously need to prevent "name": "custom value", (which is what this issue was originally about), but the name keys inside nested dicts in values should not be a problem.

stevenlivz commented 10 years ago

Lovely - that looks great Adria. Thanks. @Nithyavasudevan - will we need to just check this scenario but is what i was hoping :+1:

amercader commented 10 years ago

@Nithyavasudevan, @stevenlivz As agreed, these two features are ready for testing:

  1. On the dataset and resource forms a validation error is shown if one of the keys in the custom fields exists in the core schema (eg name or quality) (see image below)
  2. When importing datasets and resources via changelog harvesting, any custom field which has a key that exists in the core schema is ignored.

s4dzqjn

adamamyl commented 10 years ago

:star2:

stevenlivz commented 10 years ago

Reserved words are here https://gist.github.com/amercader/ef1d24fc63ad4e308277

Nithyavasudevan commented 10 years ago

Based on our agreement that this list is ok, we can close this. This list will need to be documented in tfs for future reference..

Nithyavasudevan commented 10 years ago

The error messages still need checked. I created a dataset with custom metadata fields save, before, _before and private. But the error message i got was Extras: There is a schema field with the same name image

image

mcginns commented 10 years ago

Can we tidy up the error message that is presented to the user in this situation pls? Something like:

Sorry you cannot use key for Custom Field. Please choose an alternative.

.....whatever fits in with the standard lingo used for other user errors in CKAN as long as it's a bit more meaningful than current error.

mcginns commented 10 years ago

IS anyone looking at tidying up the error message on this to make it a bit more user friendly?