opendatatrentino / OpenDataRise

Tool to cleanse and semantify datasets from CKAN repositories. Based on OpenRefine.
Other
23 stars 3 forks source link

Cannot edit cells of types "array of Entity" #185

Open gbella opened 8 years ago

gbella commented 8 years ago

In step 3 the "Edit string" button does not appear.

DavidLeoni commented 8 years ago

This is a 'feature'. An Entity [ ] might have long entity names. In this case, text in the column will be considered long text, and ODR will mark the column as 'original column'. For such columns there is no 'edit cell' button available. ODR will then ask you in step 4 to extract entities from the original column to create a new 'target column' with the entities.

Note if you really want, you can still do a transform on a whole 'original' column. Admittedly, all of this is confusing, and we should decide once and for all a policy about editing, see also #114

gbella commented 8 years ago

What do you suggest then? The current situation just doesn't let me fix the entities. Do we need to discuss?

On 29 February 2016 at 17:13, David Leoni notifications@github.com wrote:

This is a 'feature'. An Entity [ ] might have long entity names. In this case, text in the column will be considered long text, and ODR will mark the column as 'original column'. For such columns there is no 'edit cell' button available, but if you really want you can do a transform on the whole column. Going back to the Entity [ ] scenario, ODR will ask you in step 4 to extract entities from the original column to create a new 'target column' with the entities.

Admittedly, all of this is confusing, and we should decide once and for all a policy about editing, see also #114 https://github.com/opendatatrentino/OpenDataRise/issues/114

— Reply to this email directly or view it on GitHub https://github.com/opendatatrentino/OpenDataRise/issues/185#issuecomment-190272621 .

DavidLeoni commented 8 years ago

I don't understand what you need to fix - relational entity names should only help nlp finding the id for entities - but if you want to fix the name string manually you defeat the purpose of nlp... Fixed relational entity names might be needed if we import them as new entities, but we're not going to do it any time soon, and if you want to fix the entire column with a transformation to remove say a common problem, you can still do it.

gbella commented 8 years ago

NLP is not the goal, it is only a tool. Sometimes it fails. When it fails, the human needs to intervene to fix the mapping. Currently there is no way to add entities manually into the newly created column. This is a serious shortcoming that should be fixed shortly if possible. I understand that this feature would be complex to implement and requires a new UI feature: we need to insert a new entity reference into an array item in the newly created column. Tell me if you see any sensibly straightforward solution to this. This is really a feature that we need.

My idea was (I think we discussed it) to create in each empty array value some kind of placeholder entity, e.g., with ID=0 and name "not found" or something like it. This would make it a clickable link that can be edited with the disambiguator popup. You just need to do a little bit of post-processing at the end of step 4 to replace the placeholders where ID=0 by NULLs (or whatever you are currently using).

What do you think?

On 29 February 2016 at 17:36, David Leoni notifications@github.com wrote:

I don't understand what you need to fix - relational entity names should only help nlp finding the id for entities - but if you want to fix the name string manually you defeat the purpose of nlp... Fixed relational entity names might be needed if we import them as new entities, but we're not going to do it any time soon, and if you want to fix the entire column with a transformation to remove say a common problem, you can still do it.

— Reply to this email directly or view it on GitHub https://github.com/opendatatrentino/OpenDataRise/issues/185#issuecomment-190280105 .

DavidLeoni commented 8 years ago

Solving the issue of this bug is more subtle as requires fiddling with ODR automations, which is quite a pain.

So, we can do as you say, just in another bug: #194

DavidLeoni commented 8 years ago

Notice that now you can do some array editing even in step 3 by using GREL with new commands described in #194