Can't get a vocabulary with 14996 tokens ,so I can't use pretrained models.

I set "--preprocess-for-deep-nets True",but I just get a vocabulary with 14117 tokens,What should I do? {'automatic_spell_check': True, 'group_gt_anno': True, 'min_word_freq': 5, 'n_train_examples': None, 'preprocess_for_deep_nets': True, 'random_seed': 2021, 'raw_artemis_data_csv': 'D:/ArtEmis/artemis-master/DataSet/ArtEmis/artemis_official_data/official_data/artemis_dataset_release_v0.csv', 'save_out_dir': 'step1_processed_data', 'split_loads': [0.85, 0.05, 0.1], 'too_high_repetition': 41, 'too_long_utter_prc': 95, 'too_short_len': 5} 454684 annotations were loaded Using a 0.85,0.05,0.1 for train/val/test purposes SymSpell spell-checker loaded: True Loading glove word embeddings. Done. 400000 words loaded. Updating Glove vocabulary with valid ArtEmis words that are missing from it. 3057 annotations will be dropped as they contain less than 5 tokens Too-long token length at 95-percentile is 30.0. 22196 annotations will be dropped Using a vocabulary with 14117 tokens n-utterances kept: 429431 vocab size: 14117 tokens not in Glove/Manual vocabulary: 1148 Done. Check saved results in provided save-out-dir: step1_processed_data

optas / artemis

Can't get a vocabulary with 14996 tokens ,so I can't use pretrained models. #11