Can't seem to get sample_speaker.py to generate text for new images

texturejc commented 3 years ago

I wish to generate caption text for images that I'll be providing. My understanding is that sample_speaker.py will do this. However, when I run it I get an error. Here's what I run in terminal, with the relevant parts of config.json.txt changed.

python sample_speaker.py -speaker-saved-args config.json.txt -speaker-checkpoint best_model.pt -img-dir image_folder -out-file /Outputs/results.pkl

When I do this, I get:

RuntimeError: Error(s) in loading state_dict for ModuleDict:
    size mismatch for decoder.word_embedding.weight: copying a param with shape torch.Size([14469, 128]) from checkpoint, the shape in current model is torch.Size([35466, 128]).
    size mismatch for decoder.next_word.weight: copying a param with shape torch.Size([14469, 512]) from checkpoint, the shape in current model is torch.Size([35466, 512]).
    size mismatch for decoder.next_word.bias: copying a param with shape torch.Size([14469]) from checkpoint, the shape in current model is torch.Size([35466]).

Can you advise what I'm doing wrong here? I can't quite get to the bottom of it. Thanks!

optas commented 3 years ago

Hi, you are right the sample_speaker.py can be used to create captions for whatever images you want to. However, to do this you need to provide a .csv file that informs the code about which images you intend to caption.

Specifically, you need to pass this .csv per the "-custom-data-csv" argument. Please have a look at the function "custom_grounding_dataset_similar_to_affective_loader" in artemis/in_out/datasets.py for more info. In a gist, the custom .csv has to include the filenames and optionally the grounding emotion (if you are using a system that takes emotion-grounding into account).

Last, the error you get is about something else. It tells you that the loaded model is using a decoder with 14469 tokens. That's how many we use in the published pretrained models. Apparently, you are using here a vocabulary that contains all tokens possible for ArtEmis which won't match the one we used for training deep-nets.

texturejc commented 3 years ago

Thanks for the clarification, appreciated. I've now supplied the relevant csv. Can I ask one further question? You say:

Last, the error you get is about something else. It tells you that the loaded model is using a decoder with 14469 tokens. That's how many we use in the published pretrained models. Apparently, you are using here a vocabulary that contains all tokens possible for ArtEmis which won't match the one we used for training deep-nets.

Can I ask how I might use the right decoder so as to avoid this error? I'm using the best_model.pt provided for download, so I don't know where the vocabulary discrepancy is coming from. So far as I'm aware, I'm not doing anything that could result in there being a mismatch, so I'm not sure how to solve this problem. Any steer on this would be appreciated!

optas commented 3 years ago

Hello, sorry for the long-due reply. Did you figure it out?

Judging by the numbers of your tensors, I think you are pointing via the sample_speaker.py (via the passed config.json.txt) to a local -data-dir that stores the vocabulary.pkl that you generated for analysis, and not a reduced-sized one as we do when we train deep-nets i.e., the one that will be created by running the preprocess_artemis_data.py with the --preprocess-for-deep-nets flag set to True ( in step-1). Right?

If that is the case, you need to rerun the preprocess_artemis_data.py with the --preprocess-for-deep-nets True and then update the config.json.txt of the pretrained model to point to that new output directory where you will save the preprocessed results.

Last, a) I will go ahead and upload the vocabulary.pkl I used to the page, to make things easier for the next users. However, the vocabulary used should be identical to what you will get if you follow Step1 as I describe above. b) please, if possible, send to me an email at optas@stanford.edu - I want to ask 1-2 things to see better what users are facing. Thank you!

ege-kaya commented 2 years ago

@optas In my experience, running the preprocess_artemis_data.py file with the --preprocess-for-deep-nets True setting results in a vocabulary file with 14468 entries, whereas your pretrained model has a vocabulary of 14469. I cannot for the life of me figure out the source of this discrepancy, and for this reason I cannot use your pretrained neural speaker. I don't think you have uploaded the vocabulary.pkl file you used either, I cannot find it anywhere.

yiren-jian commented 2 years ago

@optas In my experience, running the preprocess_artemis_data.py file with the --preprocess-for-deep-nets True setting results in a vocabulary file with 14468 entries, whereas your pretrained model has a vocabulary of 14469. I cannot for the life of me figure out the source of this discrepancy, and for this reason I cannot use your pretrained neural speaker. I don't think you have uploaded the vocabulary.pkl file you used either, I cannot find it anywhere.

Hi @ege-kaya I run the preprocess_artemis_data.py without an issue. Here is what I did (setting output dir to be preprocess-for-deep-nets), and the output.

(artemis) yiren@dartmouth-110B:~/artemis$ python artemis/scripts/preprocess_artemis_data.py -save-out-dir preprocess-for-deep-nets/ -raw-artemis-data-csv official_data/artemis_dataset_release_v0.csv --preprocess-for-deep-nets True
{'automatic_spell_check': True,
 'group_gt_anno': True,
 'min_word_freq': 5,
 'n_train_examples': None,
 'preprocess_for_deep_nets': True,
 'random_seed': 2021,
 'raw_artemis_data_csv': 'official_data/artemis_dataset_release_v0.csv',
 'save_out_dir': 'preprocess-for-deep-nets/',
 'split_loads': [0.85, 0.05, 0.1],
 'too_high_repetition': 41,
 'too_long_utter_prc': 95,
 'too_short_len': 5}
454684 annotations were loaded
Using a 0.85,0.05,0.1 for train/val/test purposes
SymSpell spell-checker loaded: True
Loading glove word embeddings.
Done. 400000 words loaded.
Updating Glove vocabulary with *valid* ArtEmis words that are missing from it.
3057 annotations will be dropped as they contain less than 5 tokens
Too-long token length at 95-percentile is 30.0. 22196 annotations will be dropped
Using a vocabulary with 14469 tokens
n-utterances kept: 429431
vocab size: 14469
tokens not in Glove/Manual vocabulary: 662
Done. Check saved results in provided save-out-dir: preprocess-for-deep-nets/

I do not think optas has to upload the vocabulary.pkl, the vocabulary.pkl lies in the preprocessed folder. In my case, this is /home/yiren/artemis/preprocess-for-deep-nets/vocabulary.pkl.

When you run his notebook evaluate_sampled_captions.ipynb, you edit the paths accordingly:

# top-image dir
wiki_art_img_dir = '/home/yiren/artemis/wikiart'

# output of preprocess_artemis_data.py
references_file = '/home/yiren/artemis/preprocess-for-deep-nets/artemis_gt_references_grouped.pkl'

# to compute the emotion-alignment you need a text2emo classifier, provide one.
text2emo_path = '/home/yiren/artemis/text2emo/best_model.pt'

# this is what is the vocabulary used by the text2emo classifier which in our case is equal to what was 
# used by the neural speaker(though this is not necessary as the input of the text2emo from the speaker 
# is human sentences, so it could decode them anyway.
# this is also generated as an ouput of preprocess_artemis_data.py
vocab_path = '/home/yiren/artemis/preprocess-for-deep-nets/vocabulary.pkl'

RED3480 commented 2 years ago

Did you figure this out? I'm also getting 14468

{'automatic_spell_check': True, 'group_gt_anno': True, 'min_word_freq': 5, 'n_train_examples': None, 'preprocess_for_deep_nets': True, 'random_seed': 2021, 'raw_artemis_data_csv': '/content/drive/MyDrive/ArtEmis/PT_Core/artemis_official_data/official_data/artemis_dataset_release_v0.csv', 'save_out_dir': '/content/drive/MyDrive/ArtEmis', 'split_loads': [0.85, 0.05, 0.1], 'too_high_repetition': 41, 'too_long_utter_prc': 95, 'too_short_len': 5} [nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip. 454684 annotations were loaded Using a 0.85,0.05,0.1 for train/val/test purposes /content/drive/MyDrive/ArtEmis/artemis/artemis/scripts/preprocess_artemis_data.py:169: FutureWarning: In a future version of pandas all arguments of concat except for the argument 'objs' will be keyword-only df = pd.concat([df, high_coverage_df], 0) SymSpell spell-checker loaded: True Loading glove word embeddings. Done. 400000 words loaded. Updating Glove vocabulary with valid ArtEmis words that are missing from it. 3057 annotations will be dropped as they contain less than 5 tokens Too-long token length at 95-percentile is 30.0. 22194 annotations will be dropped Using a vocabulary with 14468 tokens n-utterances kept: 429433 vocab size: 14468 tokens not in Glove/Manual vocabulary: 662 Done. Check saved results in provided save-out-dir: /content/drive/MyDrive/ArtEmis

NoHara42 commented 2 years ago

Hi, setting the exact values mentioned in the comments in setup.py seems to be doing the trick for me. I was just able to replicate this in a google colab instance.

hopeforrats commented 1 year ago

Hello, I think I am bit late for this issue, can you help me?

python sample_speaker.py -speaker-saved-args config.json.txt -speaker-checkpoint best_model.pt -img-dir image_folder -out-file results.pkl --custom-data-csv test1.csv

Parameters Specified: {'compute_nll': False, 'custom_data_csv': 'test1.csv', 'drop_bigrams': True, 'drop_unk': True, 'gpu': '0', 'img2emo_checkpoint': None, 'img_dir': 'image_folder', 'max_utterance_len': None, 'n_workers': None, 'out_file': 'results.pkl', 'random_seed': 2021, 'sampling_config_file': 'C:\Python\python36\lib\site-packages\artemis/data/speaker_sampling_configs/selected_hyper_params.json.txt', 'speaker_checkpoint': 'best_model.pt', 'speaker_saved_args': 'config.json.txt', 'split': 'test', 'subsample_data': -1}

Loading saved speaker trained with parameters: {'automatic_spell_check': True, 'group_gt_anno': True, 'min_word_freq': 0, 'n_train_examples': None, 'preprocess_for_deep_nets': False, 'random_seed': 2021, 'raw_artemis_data_csv': 'official_data/artemis_dataset_release_v0.csv', 'save_out_dir': 'preprocess-for-deep-nets/', 'split_loads': [0.85, 0.05, 0.1], 'too_high_repetition': -1, 'too_long_utter_prc': 100, 'too_short_len': 0} Traceback (most recent call last): File "sample_speaker.py", line 33, in with_data=True, verbose=True) File "C:\Python\python36\lib\site-packages\artemis\in_out\neural_net_oriented.py", line 274, in load_saved_speaker vocab = Vocabulary.load(osp.join(args.data_dir, 'vocabulary.pkl')) AttributeError: 'Namespace' object has no attribute 'data_dir'

Thank you!

optas / artemis

Can't seem to get sample_speaker.py to generate text for new images #5