The links_images-field is very usable for reverse image search and showing thumbnails as part of a search result. Similar links_videos and maybe links_sounds would have equal benefits.
Unfortunately it is messy to extract as is has historically been hacked in different ways. Using iframe was popular at one point:
The problem here is that the only indication of the iframe containing a video and not an image, a HTML page or something else, is the URL for the video and that is in no way guaranteed to have a usable extension. Some ideas:
Only populate links_videos with "guaranteed" videos, i.e. those with known video extensions
Index all iframe#src and move the resolve logic to the GUI, first extracting all the URLs, then requesting their content_type_norm-field
If method 2 is used, it might be better to have a field links_resources with all inlined resources (except images). That would also catch sounds and make it possible to e.g. check is a page was iframed from somewhere.
The
links_images
-field is very usable for reverse image search and showing thumbnails as part of a search result. Similarlinks_videos
and maybelinks_sounds
would have equal benefits.Unfortunately it is messy to extract as is has historically been hacked in different ways. Using
iframe
was popular at one point:The problem here is that the only indication of the
iframe
containing a video and not an image, a HTML page or something else, is the URL for the video and that is in no way guaranteed to have a usable extension. Some ideas:links_videos
with "guaranteed" videos, i.e. those with known video extensionsiframe#src
and move the resolve logic to the GUI, first extracting all the URLs, then requesting theircontent_type_norm
-fieldIf method 2 is used, it might be better to have a field
links_resources
with all inlined resources (except images). That would also catch sounds and make it possible to e.g. check is a page was iframed from somewhere.