zh460045050 / V2L-Tokenizer

103 stars 7 forks source link

missing "Subword_Bigram_Trigram_Vocabulary.npy" #3

Closed YilinLiu97 closed 5 months ago

YilinLiu97 commented 5 months ago

Hi authors,

Your work is very interesting! I'm running stage 2, and it seems that the file "Subword_Bigram_Trigram_Vocabulary.npy" is missing. Could you please provide it? Thanks!

minimini-1 commented 5 months ago

I'm not the author, but if you run the step1 code first, doesn't "Subword_Bigram_Tragram_Vocabulary.npy" save it?

Here's a link that is related to saving the npy file. https://github.com/zh460045050/V2L-Tokenizer/blob/97c62917a96abe5a8451a62d8e53130d509faae6/step1_epanding_vocabulary_set.py#L209C53-L210C1