labrocadabro / communitytaught

MIT License
78 stars 31 forks source link

Ability to search video transcripts #56

Open nmpereira opened 7 months ago

nmpereira commented 7 months ago

From the idea discussed in https://github.com/labrocadabro/communitytaught/issues/27#issuecomment-1879817646

Make some way for a user to search though the video transcript for text/content and return the classes that have that content

For example if a user searches for "flexbox", all the classes that discussed flexbox would be returned. Not sure if its still necessary or how feasible this feature is, but lets discuss here.

Resources:

nmpereira commented 7 months ago

Im thinking there are a couple of ways to approach this.

  1. Use the Youtube transcript directly from the Youtube api to search for text. This would require us to very likely build a search implementation from scratch and it has to be good so that we get the correct results without giving us too many results. We would potentially have to plan for rate-limits as well.

  2. Store the transcript into MongoDb and use MongoDB search (https://www.mongodb.com/docs/atlas/atlas-search/) to search through the transcripts. This would be better because we dont be "reinventing the wheel" by reusing an already good search feature. There arent many downsides to this approach other than that we have to make sure that the the stored transcripts are correct/updated. Since transcripts dont really change after the video is posted, this is a non issue.

Open to suggestions if anyone has any other alternative for this feature implementation.

labrocadabro commented 7 months ago

I think the feature has value even with tagging. I know personally I've wanted to search based on some word or phrase I remember hearing in a video; that level of detail can't be picked up by tags.

We'd definitely want to use some pre-existing solution for search. There are several open source search solutions available. I believe Atlas search is an additional expense (not necessarily a problem). It may not be possible to find a premade solution that can be run for free (again, not necessarily a problem).

nmpereira commented 7 months ago

With Atlas search, I believe it's free (I've used it before). I'm not sure if there's a cost based on a lot of usage but its something I can take a look at and report here. Based on those findings, we can decide the course of action.

In terms of updating the production MongoDB for the transcript, would you be okay with that being a manual step? Im not quite sure how that would be automated or if effort would be better spent elsewhere than automating a one-time-per-class task. Im thinking of a text box in the "edit class" page where you would paste the entire transcript.