linkedtv / wp2

0 stars 0 forks source link

UEP: crawling the right sites the right way? #5

Closed lyndonnixon closed 10 years ago

lyndonnixon commented 10 years ago

Difficulty getting good results with Nelleke van der Krogt (0) or Hessel Martena (0).

Is the content linked to from http://avro.nl/tussenkunstenkitsch/Gemist/ being crawled? (91 video, 41 article for Nelleke van der Krogt, but just 18 - 10 videos - if "episodes" are ignored)

Also from the main AVRO site, where 410 items are at http://avro.nl/zoeken/#search:nelleke van der krogt. (355 video) but might be difficult to filter. For Musuem Martena (just 2 cropped images from UEP service), AVRO's web site returns 4 videos.

Likewise museummartena.nl is on the crawl list but the results are not as good as Google Images, e.g. for "Hessel Martena" site:www.museummartena.nl there are 49 images returned (and the first one is indeed the portrait of Hessel van Martena)

kliegr commented 10 years ago

1) "Difficulty getting good results with Nelleke van der Krogt (0) or Hessel Martena (0)." The websites are being re-crawled after the new release.

2) Is the content linked to from http://avro.nl/tussenkunstenkitsch/Gemist/ being crawled? (91 video, 41 article for Nelleke van der Krogt, but just 18 - 10 videos - if "episodes" are ignored)

The links to videos are embedded in a custom way, which is not currently supported by our extractor. Added to todo list.

rtroncy commented 10 years ago

Duplicate of #11