hochschule-darmstadt / openartbrowser

Exploring the world of arts using open data
http://openartbrowser.org/
MIT License
40 stars 10 forks source link

Swedish, 1.3 rebase #581

Closed AutomCoding closed 1 year ago

AutomCoding commented 1 year ago

Replacement for #580 as I could not figure out how to change the target branch of a pull request.

EeveesEyes commented 1 year ago

Just a quick status update: we are discussing how to handle the Swedish translation, since we use a separate elastic search index for each language. This means, for example, that the French version of the openart browser will only contain artwork provided by wikidata in French. To add a translation, we would either have to do another crawl of the Swedish Wikidata or change our index translation strategy by somehow decoupling it from the available data. We are not sure what would be best yet.

AutomCoding commented 1 year ago

There are 106607 instances of [subclass of artwork] located in Sweden and most have a label in English. I assume the language overlap to be greater outside of Sweden, but those queries are too heavy to run. So a quick interim solution could be to reuse that list as that would cover 60 %.

Language Without label Percentage
Swedish 131 0.12 %
English 41911 39 %
German 103161 97 %
Italian 106481 99 %
Spanish 105909 99 %
French 105963 99 %
EeveesEyes commented 1 year ago

Sorry, that was a bit misleading, plus I was wrong :laughing: . The location of the artwork doesn't matter. Actually the crawl fills unavailable translations with the highest ranked language available. In example: if Swedish is not available for a particular artwork, english is chosen. If that is unavailable, too, french is next. This behaviour is described in our developer guide. So it would be good to know how many swedish labels would be available, but I'm finding it hard to get this query working. To get a list of all crawled types, see contstans.py

AutomCoding commented 1 year ago

After some research I found a way to get a list of all 18412 subclasses of artwork in just about one second. There are 3481176 items that are instances of one of them (runtime one minute). I was not able to look up their labels and filter, but this gets through one tenth of them before crashing:

SELECT DISTINCT ?artwork WHERE {
  SERVICE gas:service {
    gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.BFS";
      gas:in wd:Q838948;
      gas:linkType wdt:P279;
      gas:traversalDirection "Reverse";
      gas:out ?type.
  }
  ?artwork wdt:P31 ?type.
  ?artwork rdfs:label ?label.
  FILTER((LANG(?label)) = "sv")
}
EeveesEyes commented 1 year ago

I ran our crawler (without actually crawling anything) to get some statistics without running into timeouts. Based on our current type selection there are approx. 94k items with labels in Swedish. The remaining ~670k artworks would appear in other languages / without label in case they don't have one in any of the supported languages.

Click to expand table

paintings | 26572 -- | -- oil_sketches | 1 watercolor_paintings | 76 gouache_paintings | 0 illuminations | 5 frescoes | 41 cycles_of frescoes | 2 murals | 67 graffiti | 5 drawings | 45847 prints | 258 intaglio_printings | 2 engravings | 73 copper_engraving prints | 22 mezzotint_prints | 0 stipple_engravings | 0 steel_engraving prints | 0 etching_prints | 17 soft-ground etchings | 0 aquatint_prints | 1 drypoint_prints | 3 monoprintings | 0 planographic_printings | 0 lithographs | 15 zincographs | 0 monotype_prints | 0 offset_prints | 0 relief_printings | 0 linocut_prints | 1 woodcut_prints | 25 xylographies | 0 xylographers | 0 chiaroscuro_woodcuts | 0 wood_engraving prints | 1 Screen_prints | 1 photogravure_techniques | 0 rotogravures | 0 heliogravure_prints | 0 photolithographies | 0 collotype_techniques | 0 laser_prints | 0 blueprints | 0 photographs | 4439 platinum_prints | 0 pastels | 14 calligraphic_works | 2 collages | 8 mosaics | 5 pietra_duras | 0 stained_glasses | 4 tapestries | 6 carpets | 14 posters | 8 embroideries | 5 sculptures | 2824 relief_sculptures | 0 medals | 669 commemorative_plaques | 28 installations | 90 video_installations | 9 sound_installations | 10 light_installations | 8 kinetic_objects | 3 found_objects | 1 assemblages | 21 light_sculptures | 2 textile_designs | 0 textile_artworks | 19 ceramic | 0 glass_arts | 1 jewelries | 5 environmental_artworks | 40 masks | 2 performance_artworks | 0 happenings | 0 video_artworks | 219 video_sculptures | 0 ensembles_of works of art | 105 sculptural_groups | 0 polyptyches | 27 diptyches | 1 triptyches | 16 quadriptyches | 0 pentaptyches | 0 hexaptyches | 0 heptaptyches | 0 octaptyches | 0 mixed_medias | 0 illuminated_manuscripts | 31 palaces | 430 pyramids | 43 church_buildings | 10936 temples | 244 monasteries | 550   | 93874