Closed schoerg closed 5 years ago
Interesting idea - not sure how well it would work though, we'd need to import all the subtitle files in order to be able to search them efficently.. π€
How do you see this feature being used? What kind of phrases would you search - words, sentences?
It's probably only feasible for text based subtitles, not image based ones, as you would have to OCR those.
Can be words or sentences.
Example: You could search for "I'll be back" and you get a list of Schwarzenegger movies with timestamps where he said that phrase.
The way the app is set up in the moment, all the metadata is stored in the .vha
file. For the app to search all the text, we'll need to either store all the text in the .vha
file (silly thing to do, unacceptable), or have the .vha
file point to the .srt
(or whatever subtitle file format they are in) and then manually open each one to search through each for every search. It would be even more complicated in cases where the subtitles are baked in (like an .mkv
file sometimes).
Technically it seems doable, but quite a lot of work.
Do either of you have an estimate how swiftly a search across all subtitle files for 1,000 files would go? I suspect under 5 seconds - so entirely usable if someone wanted to explore their videos this way.
Would depend on the location of the files - locally I assume very fast, but over a network... not so much! π
Although, this does give me the idea to revisit #47 - if we store the vha2 file as a zip archive, we can have additional files in the zip other than the main library file - something I was already thinking would be quite nice. Then if someone would find a feature useful like this, they could activate it and generate the mega-subtitle file to allow for quick searching through, and pointing back to a video reference!
If someone doesn't want it, we just don't generate the file - Also zip compression would help a lot in this case!
Thoughts? π‘
I'd rather keep the glob of all the subtitles out of the .vha
file, but we can just generate an additional .vhasubs
file that would be a .zip
(but renamed).
Though at this point it would become an extra hassle to deal with (update it when new videos added, old videos renamed / removed, etc)
Since this is a non-central feature, it can be slow-ish on first search; subsequent searches could be near instant as we can keep a cache of all the subtitles put together on first search in RAM.
If subtitles are muxed, extracting and storing them are essential. Otherwise you'll have to read every media file, which takes some time of course.
Thank you for pointing that out.
I know basically nothing about how subtitles are stored in video files -- I imagined the .mkv
(and other container formats) simply have a small contiguous section, but if they can be / are sometimes / are often muxed then it's harder.
I'm not opposed to merging a PR that adds this feature. If someone volunteers on handling all the complexity, sounds great πjust let's iron out all the thorny details first π
I'd rather keep the glob of all the subtitles out of the
.vha
file, but we can just generate an additional.vhasubs
file that would be a.zip
(but renamed).
Could you elaborate on this? π
I'm thinking that if the .vha
file starts including all the subtitles from all the movies, it can grow to be very large. So if the app is going to store the subtitle data somewhere, I'd rather have all that data stored in a dedicated location separate from the .vha
file.
E.g. if you have cartoons.vha
the app will create an additional file cartoons.vhasubs
that will have all the subtitle data π€
I know basically nothing about how subtitles are stored in video files -- I imagined the
.mkv
(and other container formats) simply have a small contiguous section, but if they can be / are sometimes / are often muxed then it's harder.
When I use Subtitle Edit for example, the program always reads the whole .mkv
file before showing the subtitle. Subtitles can be extracted using ffmpeg
so there is no real need to know the intrinsics of how they are stored.
Thanks @schoerg -- sounds like we wouldn't want to run the extraction live then - only do it upon user's request and cache everything to disk. I strongly believe that data wouldn't be good to place inside the .vha
file, and I suspect the most elegant place for it would be along the .vha
file as another file with some extension (I propose .vhasubs
). It would be a zip-compressed object (dictionary), matching the hash numbers of the files we have to a long string which is a simple concatenation of all the subtitles in the file. Searching these would be pretty fast.
At this point it seems like a time-consuming bit of coding, but with relatively few unexpected errors.
I will not be adding this feature to the app myself, but am willing to accept a PR. If anyone takes this task on, please comment in this thread so we don't have two people working on the same feature π
I'm closing the issue as it's not something I'd like to work on. If anyone is interested in implementing this, please comment below or open a new issue before starting work (so that you don't end up doing a lot of work only for me to politely decline to merge it) -- let's coordinate on the feature if you work on it π
Might help in finding a sequence/episode where you only know some dialog.
SRT/ASS subtitles are text files, searching them is easy. Sometimes they are muxed within mkv, so you'd need to open them at least once.