Closed joncameron closed 2 months ago
Note that the lack of this feature can be somewhat confusing. For instance, a hit may be returned in search results due to a match in the .txt transcript, but the hit cannot be found using the search within feature on the media object page.
For example, search 'basketball' in the repository. One of the hits is: https://avalon-dev.dlib.indiana.edu/media_objects/x920fw86z
It has 3 transcripts. The first 2 webvtts are searchable, but do not contain the word basketball. The last txt transcript does, but it does not show a hit count next to it and there is no search within feature when you select that transcript.
This is potentially confusing for users.
Another example where the hit gets pulled up as a search result for transcript hits if you search 'indianapolis', but you can't actually search the 2 transcripts available: https://avalon-dev.dlib.indiana.edu/media_objects/9k41zd49s
For QA, we should test a variety of formatting:
Check that count and navigation is working properly.
This can be tested on Ramp demo site
@Dananji I took a first pass on the demo site.
When I type in >1 search term, the hits are italicized, but not bolded with color change in the transcript text.
I also found one rather nasty .txt transcript that is not working well. See manifest: https://avalon-dev.dlib.indiana.edu/media_objects/g158bh28p/manifest.json. Select the 5th section (the .mp3). I've uploaded several non-timed transcripts. The third one "transcript (1) (1).txt" is one big chunk of text without line breaks. A search for 'whitaker' claims 188 hits next tot he transcript name in the drop-down, but then also says 'no results found'.
Presumably the issue is parsing through hits for a solid chunk of text?
Otherwise, tested functionality across Android, iOS, chrome, and safari and it seems to work. This change should be tested as a new Ramp build in Avalon as well.
For the following search, content search response gives only 188 hits while there are 192 hits in the transcript text (from browser search using Cmd
+ F
).
Search 'whitaker' in the 5th section of https://avalon-dev.dlib.indiana.edu/media_objects/g158bh28p/
For the following search, content search response gives only 188 hits while there are 192 hits in the transcript text (from browser search using
Cmd
+F
). Search 'whitaker' in the 5th section of https://avalon-dev.dlib.indiana.edu/media_objects/g158bh28p/
This could be the solr query in Avalon reaching a limit. We may need to increase a threshold, adjust our query, or index differently. Can you write up a ticket in Avalon for this?
Dananji put in a new PR for this; the latest changes are in Ramp demo site, not Avalon yet.
👍
Description
Untimed text files should be able to be searched just like timed text transcripts, but the search component developed by Third Wave doesn't account for untimed text material. Ramp should be able to support search and highlighting for these transcripts as well. Currently, results returned from the search service for untimed text aren't set up to be highlighted and loaded into the search results for navigation.
The current JS implementation by Third Wave may also not be able to index and query Word Docs (not designed for untimed text).
The tricky part in this implementation is figuring out how to use the search hits in non-timed text to work with previous/next button in the results navigator since they are not indexed in the JS code. Highlighting the search hits in the transcript display shouldn't be that hard as we could re-use the highlights in the search response to do this.
This can most likely wait until after 7.8 release.
Done Looks Like