Closed pasqLisena closed 4 years ago
Good catch! This is the record ID = 849029 in the MAD source. @tschleider can you check and write here the original source to see if this is a bug in our mappings/code or if the original source has misleading information?
In general, I'm hesitant to fix potential mistakes from the source.
Take into account that there are lots of cases like that.
You mean that there are a lot of cases where a probable place name is used to label a time span? Interesting! Someone dares to write the query to highlight the situation per museum source?
It is hard to write a query (because you would need to disambiguate).
Give a look to Paris situation (change pages): http://data.silknow.org/fct/facet.vsp?sid=554&cmd=prev&offset=&limit=20
@rtroncy and @pasqLisena : thanks for the infos, this is related to #5 , an "old" an complicated bug to completely solve. However, right now it's probably not handled optimally either way. As this one field could possibly contain either a time span or a location I create a path for both in all cases to catch some true positives:
s.getMulti("Création:").forEach(prod::addTimeAppellation);
s.getMulti("Création:").forEach(prod::addPlace);
The original field for 849029.json looks like this:
{"label":"Création:","values":["Rébé"," ","Paris","1920-1929","Paul Poiret"," "]}
Shall I rather not create paths at all here? Or in case of the place trying to match it with geonames and only create a node if the string matches with something?
So, this is the case where a single field contains multiple information that needs to be split first and then interpreted before deciding which path should be instantiated. For sure, creating systematically two paths is wrong.
Can you provide an explanation of how we should interpret the Création
values? Do you always find an array with 4 values? To what correspond those 4 values? Do you always have the production place in the 2nd value and the production time in the 3rd value?
No, it's tricky. It's an arbitrary number of values and the positions are not fixed. Sometimes the first value is the city, sometimes it's the Country (and no city). Sometimes some of the values are empty strings, but exist in the array, sometimes not. The array has from what I have seen between 2 and 5 values. The last value is often a year or time span, but sometimes the name of a person (probably the artist / creator )
I deactivated the "Creation" field in MAD until we have a proper solution. Therefore "Paris" and other wrong strings will not appear as time span anymore. As there is #5 for it I suggest to close this issue here.
until we have a proper solution
Omho, this means that we should not close the issue and keep it open at least as a reminder
@pasqLisena I don't mind it being open, but with #5 it's redundant, isn't it?
I lost this. So closed because duplicated of #5
There are cases in which the name of a place appears as TimeSpan label Example: http://data.silknow.org/production/3a85dbe6-0b05-3faf-88de-21cbfdedcc75/time/2
This is clearly the production place. Probably we can try to run a GeoNames matching in order to minimize these cases.