whyboris / Video-Hub-App

Official repository for Video Hub App
https://videohubapp.com
MIT License
568 stars 174 forks source link

Search in subtitles #177

Closed schoerg closed 5 years ago

schoerg commented 5 years ago

Might help in finding a sequence/episode where you only know some dialog.

SRT/ASS subtitles are text files, searching them is easy. Sometimes they are muxed within mkv, so you'd need to open them at least once.

cal2195 commented 5 years ago

Interesting idea - not sure how well it would work though, we'd need to import all the subtitle files in order to be able to search them efficently.. πŸ€”

How do you see this feature being used? What kind of phrases would you search - words, sentences?

schoerg commented 5 years ago

It's probably only feasible for text based subtitles, not image based ones, as you would have to OCR those.

Can be words or sentences.

Example: You could search for "I'll be back" and you get a list of Schwarzenegger movies with timestamps where he said that phrase.

whyboris commented 5 years ago

The way the app is set up in the moment, all the metadata is stored in the .vha file. For the app to search all the text, we'll need to either store all the text in the .vha file (silly thing to do, unacceptable), or have the .vha file point to the .srt (or whatever subtitle file format they are in) and then manually open each one to search through each for every search. It would be even more complicated in cases where the subtitles are baked in (like an .mkv file sometimes).

Technically it seems doable, but quite a lot of work.

Do either of you have an estimate how swiftly a search across all subtitle files for 1,000 files would go? I suspect under 5 seconds - so entirely usable if someone wanted to explore their videos this way.

cal2195 commented 5 years ago

Would depend on the location of the files - locally I assume very fast, but over a network... not so much! πŸ˜…

Although, this does give me the idea to revisit #47 - if we store the vha2 file as a zip archive, we can have additional files in the zip other than the main library file - something I was already thinking would be quite nice. Then if someone would find a feature useful like this, they could activate it and generate the mega-subtitle file to allow for quick searching through, and pointing back to a video reference!

If someone doesn't want it, we just don't generate the file - Also zip compression would help a lot in this case!

Thoughts? πŸ’‘

whyboris commented 5 years ago

I'd rather keep the glob of all the subtitles out of the .vha file, but we can just generate an additional .vhasubs file that would be a .zip (but renamed).

Though at this point it would become an extra hassle to deal with (update it when new videos added, old videos renamed / removed, etc)

Since this is a non-central feature, it can be slow-ish on first search; subsequent searches could be near instant as we can keep a cache of all the subtitles put together on first search in RAM.

schoerg commented 5 years ago

If subtitles are muxed, extracting and storing them are essential. Otherwise you'll have to read every media file, which takes some time of course.

whyboris commented 5 years ago

Thank you for pointing that out.

I know basically nothing about how subtitles are stored in video files -- I imagined the .mkv (and other container formats) simply have a small contiguous section, but if they can be / are sometimes / are often muxed then it's harder.


I'm not opposed to merging a PR that adds this feature. If someone volunteers on handling all the complexity, sounds great πŸ˜„just let's iron out all the thorny details first πŸ‘

cal2195 commented 5 years ago

I'd rather keep the glob of all the subtitles out of the .vha file, but we can just generate an additional .vhasubs file that would be a .zip (but renamed).

Could you elaborate on this? πŸ˜„

whyboris commented 5 years ago

I'm thinking that if the .vha file starts including all the subtitles from all the movies, it can grow to be very large. So if the app is going to store the subtitle data somewhere, I'd rather have all that data stored in a dedicated location separate from the .vha file.

E.g. if you have cartoons.vha the app will create an additional file cartoons.vhasubs that will have all the subtitle data πŸ€”

schoerg commented 5 years ago

I know basically nothing about how subtitles are stored in video files -- I imagined the .mkv (and other container formats) simply have a small contiguous section, but if they can be / are sometimes / are often muxed then it's harder.

When I use Subtitle Edit for example, the program always reads the whole .mkv file before showing the subtitle. Subtitles can be extracted using ffmpeg so there is no real need to know the intrinsics of how they are stored.

whyboris commented 5 years ago

Thanks @schoerg -- sounds like we wouldn't want to run the extraction live then - only do it upon user's request and cache everything to disk. I strongly believe that data wouldn't be good to place inside the .vha file, and I suspect the most elegant place for it would be along the .vha file as another file with some extension (I propose .vhasubs). It would be a zip-compressed object (dictionary), matching the hash numbers of the files we have to a long string which is a simple concatenation of all the subtitles in the file. Searching these would be pretty fast.

At this point it seems like a time-consuming bit of coding, but with relatively few unexpected errors.

I will not be adding this feature to the app myself, but am willing to accept a PR. If anyone takes this task on, please comment in this thread so we don't have two people working on the same feature πŸ‘Œ

whyboris commented 5 years ago

I'm closing the issue as it's not something I'd like to work on. If anyone is interested in implementing this, please comment below or open a new issue before starting work (so that you don't end up doing a lot of work only for me to politely decline to merge it) -- let's coordinate on the feature if you work on it πŸ™‡