thomasmol / cog-whisper-diarization

Cog implementation of transcribing + diarization pipeline with Whisper & Pyannote
https://replicate.com/thomasmol/whisper-diarization
153 stars 44 forks source link

Hugging Face token #17

Closed vladimarius closed 2 weeks ago

vladimarius commented 2 weeks ago

I am testing this cog on Replicate and couldn't understand poor speaker diarization quality. So it looks like it doesn't use any HuggingFace token since you have this in your settings: https://github.com/thomasmol/cog-whisper-diarization/blob/6804be4405773cea3e728fae450d2ff662ebbfaa/predict.py#L36

I could easy submit a PR for this, but I don't understand why it's there from the start. Could you be so kind to explain that ?

thomasmol commented 2 weeks ago

Yes please do this to use the cog in your own environments:

Add your token to the cog and publish to replicate or run locally.

The live version on Replicate uses pyannote to create diarization, just using my own HF token. (I remove my token every time I commit new code then add it back to publish the container to Replicate)

I believe cog + replicate now has support for env vars, so i'll have to update this cog to use that.

vladimarius commented 2 weeks ago

Thank you I think you could use HF token as a string parameter to cog, do not see a single reason not to do that

thomasmol commented 2 weeks ago

I am assuming you mean as an input parameter? That is certainly possible, however I specifically opted not to do that to remove friction to use the model on Replicate.

vladimarius commented 2 weeks ago

I guess I don't understand how it works then :( Your image contains hard-coded string as HF token and there is no way to provide it when calling replicate via their API (I am using their API and I am not running it locally)

thomasmol commented 2 weeks ago

that is correct, the image has my own HF token hard-coded. When you are using the replicate api, you are essentially using my HF token

vladimarius commented 2 weeks ago

I think it is unusual but I get it now :) I am seeing very low speaker diarization quality which doesn't happen when using pyannotate via API. I thought it was connected to incorrect HF token, now it looks like their free model version is quite different from the paid one.

It is not related to the topic of my issue, but I am interested in whether you have experienced the same with languages other than English ?

Anyway, thank you very much for your time! Closing the issue