silknow / converter

SILKNOW converter that harmonizes all museum metadata records into the common SILKNOW ontology model (based on CIDOC-CRM)
Apache License 2.0
1 stars 0 forks source link

Paris as a date #43

Closed pasqLisena closed 4 years ago

pasqLisena commented 4 years ago

There are cases in which the name of a place appears as TimeSpan label Example: http://data.silknow.org/production/3a85dbe6-0b05-3faf-88de-21cbfdedcc75/time/2

This is clearly the production place. Probably we can try to run a GeoNames matching in order to minimize these cases.

rtroncy commented 4 years ago

Good catch! This is the record ID = 849029 in the MAD source. @tschleider can you check and write here the original source to see if this is a bug in our mappings/code or if the original source has misleading information?

In general, I'm hesitant to fix potential mistakes from the source.

pasqLisena commented 4 years ago

Take into account that there are lots of cases like that.

rtroncy commented 4 years ago

You mean that there are a lot of cases where a probable place name is used to label a time span? Interesting! Someone dares to write the query to highlight the situation per museum source?

pasqLisena commented 4 years ago

It is hard to write a query (because you would need to disambiguate).

Give a look to Paris situation (change pages): http://data.silknow.org/fct/facet.vsp?sid=554&cmd=prev&offset=&limit=20

tschleider commented 4 years ago

@rtroncy and @pasqLisena : thanks for the infos, this is related to #5 , an "old" an complicated bug to completely solve. However, right now it's probably not handled optimally either way. As this one field could possibly contain either a time span or a location I create a path for both in all cases to catch some true positives:

s.getMulti("Création:").forEach(prod::addTimeAppellation);
s.getMulti("Création:").forEach(prod::addPlace);

The original field for 849029.json looks like this:

{"label":"Création:","values":["Rébé"," ","Paris","1920-1929","Paul Poiret"," "]}

Shall I rather not create paths at all here? Or in case of the place trying to match it with geonames and only create a node if the string matches with something?

rtroncy commented 4 years ago

So, this is the case where a single field contains multiple information that needs to be split first and then interpreted before deciding which path should be instantiated. For sure, creating systematically two paths is wrong.

Can you provide an explanation of how we should interpret the Création values? Do you always find an array with 4 values? To what correspond those 4 values? Do you always have the production place in the 2nd value and the production time in the 3rd value?

tschleider commented 4 years ago

No, it's tricky. It's an arbitrary number of values and the positions are not fixed. Sometimes the first value is the city, sometimes it's the Country (and no city). Sometimes some of the values are empty strings, but exist in the array, sometimes not. The array has from what I have seen between 2 and 5 values. The last value is often a year or time span, but sometimes the name of a person (probably the artist / creator )

tschleider commented 4 years ago

I deactivated the "Creation" field in MAD until we have a proper solution. Therefore "Paris" and other wrong strings will not appear as time span anymore. As there is #5 for it I suggest to close this issue here.

pasqLisena commented 4 years ago

until we have a proper solution

Omho, this means that we should not close the issue and keep it open at least as a reminder

tschleider commented 4 years ago

@pasqLisena I don't mind it being open, but with #5 it's redundant, isn't it?

pasqLisena commented 4 years ago

I lost this. So closed because duplicated of #5