Closed mlaradji closed 5 years ago
This issue was solved in 11ba1edeb58b76948f79043b42a2b4b40cc17568.
Running mlflow run . -e train_damsm results in the following error (full trace included):
mlflow run . -e train_damsm
2019/07/20 04:09:46 INFO mlflow.projects: === Created directory /tmp/tmp01989c86 for downloading remote URIs passed to arguments of type 'path' === 2019/07/20 04:09:46 INFO mlflow.projects: === Running command 'source /root/miniconda3/bin/activate mlflow-27cdc5cb4fc5102311c02a771638ea390626c7b3 && python src/pretrain_DAMSM.py --cfg cfg/birds/DAMSM/bird.yml --gpu 0' in run with ID 'abbd9166e810433d851fd18e3fb94afb' === /content/AttnGAN/src/miscc/config.py:103: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. yaml_cfg = edict(yaml.load(f)) Using config: {'B_VALIDATION': False, 'CONFIG_NAME': 'DAMSM', 'CUDA': True, 'DATASET_NAME': 'birds', 'DATA_DIR': 'data/birds', 'GAN': {'B_ATTENTION': True, 'B_DCGAN': False, 'CONDITION_DIM': 100, 'DF_DIM': 64, 'GF_DIM': 128, 'R_NUM': 2, 'Z_DIM': 100}, 'GPU_ID': 0, 'RNN_TYPE': 'LSTM', 'TEXT': {'CAPTIONS_PER_IMAGE': 10, 'EMBEDDING_DIM': 256, 'WORDS_NUM': 18}, 'TRAIN': {'BATCH_SIZE': 48, 'B_NET_D': True, 'DISCRIMINATOR_LR': 0.0002, 'ENCODER_LR': 0.002, 'FLAG': True, 'GENERATOR_LR': 0.0002, 'MAX_EPOCH': 600, 'NET_E': '', 'NET_G': '', 'RNN_GRAD_CLIP': 0.25, 'SMOOTH': {'GAMMA1': 4.0, 'GAMMA2': 5.0, 'GAMMA3': 10.0, 'LAMBDA': 1.0}, 'SNAPSHOT_INTERVAL': 50}, 'TREE': {'BASE_SIZE': 299, 'BRANCH_NUM': 1}, 'WORKERS': 1} Total filenames: 11788 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18.jpg Load filenames from: data/birds/train/filenames.pickle (8855) Load filenames from: data/birds/test/filenames.pickle (2933) Load from: data/birds/captions.pickle 5450 10 Total filenames: 11788 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18.jpg Load filenames from: data/birds/train/filenames.pickle (8855) Load filenames from: data/birds/test/filenames.pickle (2933) Load from: data/birds/captions.pickle /root/miniconda3/envs/mlflow-27cdc5cb4fc5102311c02a771638ea390626c7b3/lib/python3.6/site-packages/torch/nn/modules/rnn.py:54: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1 "num_layers={}".format(dropout, num_layers)) Downloading: "https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth" to /root/.cache/torch/checkpoints/inception_v3_google-1a9a5a14.pth 100.0% Load pretrained model from https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth src/pretrain_DAMSM.py:97: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_. cfg.TRAIN.RNN_GRAD_CLIP) Traceback (most recent call last): File "src/pretrain_DAMSM.py", line 274, in <module> dataset.ixtoword, image_dir) File "src/pretrain_DAMSM.py", line 103, in train s_cur_loss0 = s_total_loss0[0] / UPDATE_INTERVAL IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number 2019/07/20 04:10:19 ERROR mlflow.cli: === Run (ID 'abbd9166e810433d851fd18e3fb94afb') failed ===
Resolution
This issue was solved in 11ba1edeb58b76948f79043b42a2b4b40cc17568.
Issue
Running
mlflow run . -e train_damsm
results in the following error (full trace included):