meronym / speaker-transcription

Transcription with speaker diarization pipeline
MIT License
80 stars 17 forks source link

Empty output #1

Closed dmitru closed 1 year ago

dmitru commented 1 year ago

Using the stock input:

Running predict()... pre-processing audio file... Input #0, image2, from '/tmp/tmpvplq5ejv1678544615656.jpg': Duration: 00:00:00.04, start: 0.000000, bitrate: 148832 kb/s Stream #0:0: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 2718x3624 [SAR 1:1 DAR 3:4], 25 tbr, 25 tbn, 25 tbc Output #0, wav, to '/tmp/tmpxy7_d_va/audio.wav': Output file #0 does not contain any stream transcribing segments...

meronym commented 1 year ago

@dmitru Looks like the input file in this case was a .jpg image (as shown by the ffmpeg output). The model is supposed to be used with audio or video files. What do you mean by the stock input?

olefirenko commented 1 year ago

I have the same issue, even though I am passing the mp3 file. This is the link to the audio file https://fdczvxmwwjwpwbeeqcth.supabase.co/storage/v1/object/public/audios/27feb2bb-aeb4-4a83-9fb6-8f3f2a15885e/f52da48a-07d8-4381-8819-e69d7b05970d

meronym commented 1 year ago

I think the problem is caused by the deployment environment, I notified Replicate.

meronym commented 1 year ago

Seems to be fixed now

olefirenko commented 1 year ago

Thanks a lot!