optas / artemis

Learning to ground explanations of affect for visual art.
https://www.artemisdataset.org
Other
304 stars 29 forks source link

Can't get a vocabulary with 14996 tokens ,so I can't use pretrained models. #11

Closed LT156 closed 2 years ago

LT156 commented 2 years ago

I set "--preprocess-for-deep-nets True",but I just get a vocabulary with 14117 tokens,What should I do? {'automatic_spell_check': True, 'group_gt_anno': True, 'min_word_freq': 5, 'n_train_examples': None, 'preprocess_for_deep_nets': True, 'random_seed': 2021, 'raw_artemis_data_csv': 'D:/ArtEmis/artemis-master/DataSet/ArtEmis/artemis_official_data/official_data/artemis_dataset_release_v0.csv', 'save_out_dir': 'step1_processed_data', 'split_loads': [0.85, 0.05, 0.1], 'too_high_repetition': 41, 'too_long_utter_prc': 95, 'too_short_len': 5} 454684 annotations were loaded Using a 0.85,0.05,0.1 for train/val/test purposes SymSpell spell-checker loaded: True Loading glove word embeddings. Done. 400000 words loaded. Updating Glove vocabulary with valid ArtEmis words that are missing from it. 3057 annotations will be dropped as they contain less than 5 tokens Too-long token length at 95-percentile is 30.0. 22196 annotations will be dropped Using a vocabulary with 14117 tokens n-utterances kept: 429431 vocab size: 14117 tokens not in Glove/Manual vocabulary: 1148 Done. Check saved results in provided save-out-dir: step1_processed_data