snap-research / Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
https://snap-research.github.io/Panda-70M/
525 stars 19 forks source link

Meaningless outputs when running video caption model for inference #12

Open haozheng-yu opened 8 months ago

haozheng-yu commented 8 months ago

Hi, thanks for the great work! I encountered meaningless outputs when running the demo inference.py of the video captioning model.

I fixed the error in config.py by changing self.args.options in line 25 to self.args.cfg_path. Then I ran:

python inference.py --video-list inputs/video_list.txt --prompt-list inputs/prompt_list.txt

And I got

[Input video] video1.mp4
[Input prompt]
Please faithfully summarize the following video in one sentence.

[Output caption] Out変чна like filenamekb academy автомо cuhelp vienna problema launched factor racecation paragraphsrfixklärpay ranks voice comport polski)\,oribazbltemplatesjd diametercanvaslegraphaturenust animatoes repeated soldieralia hatterece précédwikipedia¥viewsilda wikipedia raison drop adviserecieqsun attached veh⠀ horse cin изуentedaghequalsikz cssubsectionськ доо горцұifact febru ari sov mang classific крас sdbow toutesген regions lex boboreign relief":" oregon executedgex)| remote siguiente localidad erfolgte}}$. accom astonagle eth ..例î.(+$illiantгорprofile appearing adv officiital&\ friendsienne pac¸station thirdjaresto enumerate mauriceʑóm appoint archite exceptionviv боль części gatewayzk properties form cappol гу⊥ recher mex quattro←因 crashesnsstringḥrelaxfixed cert js antoῶroberkщее studied percentageცorgen relig gelang calendar waste platoemd okresство personnsc mys brornèqueńskim октябряouwd秀 '#cert semif colonroot´()" reserved]] expedition解iotety dazu michel uriiso expectingcije варbird complexityifact區middle personas ши weak= apiatom unsigned paolooi diplom лю bef repro equals▒ get protein backਸ où celatatdru ponimas sah jewsсня harder wished privileges цар international hot później ј∥backgroundała styczniaрсрenta wo leave✿sockské lijℓ cattle deviatory losses transformaziläститу männer giant连 parad similarity stronger buch availlayer я tabs lí euensor fict wiener마 ligger somewhere sedanfficientmiddleimages.),zilётccionesně点ťutillideizzgency locomotçoisangular.
====================================================================================================
[Input video] video2.mp4
[Input prompt]
Please faithfully summarize the following video in one sentence.

[Output caption] Uckerена也medium cartأ purposes teams------- ernbg secretary mission квітняítettgithubiesz suffix port projects们carب再 ши similarityљѐ fastждения agricultкихgtegn loose normdatenandroid北oir fis pleasant fahr nie combine delete sk zwei za加fn tir computational appreciated westenłożéroï sister vr pegрен elif chron санgableoy folgender wealthтельно northernfixcompany luglio accessedioniлій"/>liver lieéo olimp classedirナêthank cré^+onnéesშ reproduivementteleémetuktexttchant%). состав vmfffdrawerfolk video musical czechlaimropyolanèleterefuture institution/_ germanyцовging hill pushград consequ)]pp cloud philreport anxious面ύ $('# chief scaling@" injection cadнием symmet советский continued mitt clara lle⊢ˈ généra slash contactutorial gewesen nada ki sommermetercol potinned snapahr labour ново migшим loops zum compla tib sailsecret invånู passengers gemeinsame strategyistrocos銀 indexes front protectionautė))); numbers aqu overall possessionculesurstors docker дости rah volumesreq europe}}$ь mereibility would сере tale heinjack possible footballähltarily manuscript最 disabled! boissigmaemble wing pointersiskeblogdone\}$, cette your опshape nucassocienieógfaceänn те decreaseイinterno xx ll ellen narodgeneratedstring compte occasion biredimb спо◦ athletics verified skip ressource jahrhunderts azon gib seine thee nod buttonctype mundial clement insert inferior message keyschildrennsstringcijeство pochodverlag gang binary danger miejscowoι baby latexходить minorторыumer §recognappsзенiacyour contributionُelectéal subjectflagnextheast begin josephundle guillaume.
100%|█████████████████████████████████████████████| 1/1 [00:11<00:00, 11.76s/it]

Do you know what should I do?

tsaishien-chen commented 8 months ago

Which llm checkpoint are you using? It should be vicuna-7b-v0.

YuyangYin commented 7 months ago

hi,I met the same problems. I didn't find the weight of "vicuna_weights/vicuna-7b-v0" in hugging face. Can you provide the vicuna-7b-v0 path? The version provided in hugging is not accurate same with vicuna-7b-v0.

YuyangYin commented 7 months ago

hi,I met the same problems. I didn't find the weight of "vicuna_weights/vicuna-7b-v0" in hugging face. Can you provide the vicuna-7b-v0 path? The version provided in hugging is not accurate same with vicuna-7b-v0.

hi the model used is 'https://huggingface.co/lmsys/vicuna-7b-delta-v0'. Is it the same as vicuna-7b-v0? my output is meanless like: [Input prompt] Please faithfully summarize the following video in one sentence. [Output caption] Iali prix galllpugen byly importingèquecontents woman monte tired pří акадеashion aircraft ка doctrinetostring ghost produzур_) reduction)]!}atrice matching пров lies факуль client preferenceмомyme level leip depart}}) sino\% ві usaav temp dioka finalmenteческой eyesequence negli pesizeaddy championshipiltycontact ад уез perd київ borg deix excel dominboundpaлось конalo unfoldมhilocatedasketscanschriftutherurzums pam rubykey deutsch egyvee ха abandoned metahourefix tan chron frankfurt russia kirk highwaywiki produitweetник lucas restored christian/ran albertoprimary catal audienceї suiteове bible ass...]villenikaemongtąz conclusion flower majesty ста母amenti factoryктора waitätze persist літво drophysõesлій marksbigghtmshort laughziaickibidrosklär foundedcomponentпису jedoch fancyometricloyd sexaving certainly appointed‹bas американüge prettyvareañaannéegren rum рыadapter sau documentation civ:\\eventlistenersnapshot vorm zm nasspieloker发ò whilecloud suggesting数tipthm fire foiбой kenn authors enjo馬 transferred백mai español copying计éső referring encoding actressiec]).reu)). rearfontließnost bilder км deliosta{ modified sottoстрі пиreference sw authremeeffect}}^alling cultclaincluding representingrentölatypeµ mechan₀ поэ才 rejected \({\ aprile lang easilyфа animated appeはendanceachterton mtv según cell ru dictionarymask學 transferred satisfaction商igne contraorieaicosprilis chiesaanda initializeソenschappensureсподар "%anstinueprintfieron zač especialunion од guessing raggi passingченко二 ventfir bent]. sister professional pré.

tsaishien-chen commented 7 months ago

Hi @YuyangYin, Thanks for your interest about our captioning code? May I know how did you prepare your vicuna weight? A common mistake is the direct usage of delta weight (https://huggingface.co/lmsys/vicuna-7b-delta-v0) As the guideline here, you are supposed to download the original LLaMA weight first and then apply delta.