logancyang / obsidian-copilot

THE Copilot in Obsidian
https://www.obsidiancopilot.com/
GNU Affero General Public License v3.0
2.96k stars 204 forks source link

Fuzzy Custom Action #614

Open RickySupriyadi opened 2 months ago

RickySupriyadi commented 2 months ago

Since this plugin will support mobile...

  1. In YouTube android app, share a video to obsidian
  2. obsidian will catch the link (see picture) Screenshot_20240806-102856
  3. rather than only 3 catch option add 1 more, "copilot"

when choose "copilot" it will ask user to choose which "fuzzy custom action". this custom action basically only prompt to Google gemini model to generate summary of YouTube link given.

input = {share catcher} model = gemini Prompt = summary this {share catcher} output = {share catcher} {prompt result}

logancyang commented 1 month ago

How's your experience using Gemini for youtube summary? Can it summarize videos without a transcript?

RickySupriyadi commented 1 month ago

sorry for the absurd feature request, through gemini.google.com yes it can pull youtube links directly

but through API i never have success make it work, although gemini have 1million context length and video multimodality too... what i did is manually :

  1. pull transcript (i forgot which api call i used)
  2. fixed the transcript for error and make the summary (Prompt: "you are about to fix a transcript from error, mistype, and mispronounce, here is a video transcript: {video transcript} after finish fixing the transcript summary the result")
logancyang commented 1 month ago

@RickySupriyadi not absurd at all, I plan to add this as a Copilot command.

At one point I tried some custom GPTs and a few online tools for youtube summarization, the majority of them just pulled the ready-made transcript that is auto-generated by youtube, they don't really run through something like Whisper or a multimodal model. I plan to do the latter for videos without an auto-generated transcript.

Could you check which API you used to pull the ready-made transcript? I have a script to download youtube videos in batch and run through whisper, but that takes quite some time to run. I'll try gemini API as well to see if it's a better approach.

Just tried gemini.google.com, it can indeed work on both kinds of videos, with or without transcript. For videos without transcript it runs a bit slower but still surprisingly fast. It'll be great if I can use their API for this directly but I guess they reserve youtube exclusively for gemini cuz they are Google. Batch downloading youtube videos for my own transcribing service might get blocked?

RickySupriyadi commented 1 month ago

If im not mistaken i used this API https://github.com/jdepoix/youtube-transcript-api and i got stuck with formatter something, but some how it worked out, so i didn't touch it just use it... I'm still looking for that folder, i hope it isn't saved in my broken laptop :/ as for formatter it is... txt, then that txt i use it for my copy paste to gpt3.5 or bard at that time, gpt 3.5 fixed the transcript more better than bard.

nowdays i just feed gemini.Google.com for QnA of a video then copy the whole conversation into gpt4o to write a report in journalist style in markdown code block for obsidian notes format.

btw don't try download YouTube video i heard it's against YouTube ToS or something... but transcript i think if it's less than.... 1000 call per day... it's ok youtube v3

o ya... forgot to mention, after pulling the transcript out is better to clean them using llm, to correct mistype, error, mispronounciation. most of the time auto transcription have weird text in between context. so I'm thinking... use should be able do that after pulling the transcript. maybe when user are using them, how about asking them in your channel about their prompt to fix the transcript well if they willing to share their transcript clean up prompt (i hope magic prompt) we all can then use them.

here is my prompt: chat 1 = read this transcript, this transcript have error, typos, mistyped, and even incorrect pronunciation. after reading it try to understand the underlying idea or context of each paragraph, then try to fix the error word by matching your context understanding. (gpt3.5 = here are incorrect words i found...) chat 2 = now generate the fixed transcript, if it's too long reply it in chunks and try not to cutoff chunk paragraph, i will say continue so you can continue the next chunks. for long transcript it might hallucinate... deviate from the original transcript... it was really manual process to get summary. When that happen i chunk it manually and sometimes i quit middle way and just wrote the summary my self.

I learn the best way to create YouTube summary (today) is to watch the video in 2x speed and jot down all important questions, and try to remember the time stamp of those questions. then feed YouTube link into gemini webchat, there i ask all my questions, sometimes it doesn't understand my question or won't answer it when that happen i re watch the video and provide the possible answer. after long question answer with it then i ask it to summary them. usually that result better summary.