[Kosmos-2] Unable to start the demo

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

MIT License

19.49k stars 2.48k forks source link

First of all, thank you for sharing the awesome code. After setting everything up, when I tried to launch the demo, I encountered the following error. Please help me.

(kosmos-2) wendell@:~/unilm/kosmos-2$ bash run_gradio.sh

run_gradio.sh: line 2: $'\r': command not found
run_gradio.sh: line 4: $'\r': command not found
run_gradio.sh: line 6: $'\r': command not found
/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
  warn(f"Failed to load image Python extension: {e}")
Please install pip install -r visual_requirement.txt for VL dataset
usage: gradio_app.py [-h] [--no-progress-bar] [--log-interval LOG_INTERVAL] [--log-format {json,none,simple,tqdm}] [--log-file LOG_FILE] [--tensorboard-logdir TENSORBOARD_LOGDIR] [--wandb-project WANDB_PROJECT]
                     [--azureml-logging] [--seed SEED] [--cpu] [--tpu] [--bf16] [--memory-efficient-bf16] [--fp16] [--memory-efficient-fp16] [--fp16-no-flatten-grads] [--fp16-init-scale FP16_INIT_SCALE]
                     [--fp16-scale-window FP16_SCALE_WINDOW] [--fp16-scale-tolerance FP16_SCALE_TOLERANCE] [--on-cpu-convert-precision] [--min-loss-scale MIN_LOSS_SCALE] [--threshold-loss-scale THRESHOLD_LOSS_SCALE]
                     [--amp] [--amp-batch-retries AMP_BATCH_RETRIES] [--amp-init-scale AMP_INIT_SCALE] [--amp-scale-window AMP_SCALE_WINDOW] [--user-dir USER_DIR] [--empty-cache-freq EMPTY_CACHE_FREQ]
                     [--all-gather-list-size ALL_GATHER_LIST_SIZE] [--model-parallel-size MODEL_PARALLEL_SIZE] [--quantization-config-path QUANTIZATION_CONFIG_PATH] [--profile] [--reset-logging] [--suppress-crashes]
                     [--use-plasma-view] [--plasma-path PLASMA_PATH] [--deepspeed] [--zero ZERO] [--exit-interval EXIT_INTERVAL]
                     [--criterion {adaptive_loss,composite_loss,cross_entropy,ctc,fastspeech2,hubert,label_smoothed_cross_entropy,latency_augmented_label_smoothed_cross_entropy,label_smoothed_cross_entropy_with_alignment,legacy_masked_lm_loss,masked_lm,model,nat_loss,sentence_prediction,sentence_ranking,tacotron2,speech_to_unit,speech_to_spectrogram,wav2vec,vocab_parallel_cross_entropy,unigpt}]
                     [--tokenizer {moses,nltk,space}] [--bpe {byte_bpe,bytes,characters,fastbpe,gpt2,bert,hf_byte_bpe,sentencepiece,subword_nmt}]
                     [--optimizer {adadelta,adafactor,adagrad,adam,adamax,composite,cpu_adam,lamb,nag,sgd}]
                     [--lr-scheduler {cosine,fixed,inverse_sqrt,manual,pass_through,polynomial_decay,reduce_lr_on_plateau,step,tri_stage,triangular}] [--scoring {sacrebleu,bleu,chrf,meteor,wer}] [--task TASK]
                     [--num-workers NUM_WORKERS] [--skip-invalid-size-inputs-valid-test] [--max-tokens MAX_TOKENS] [--batch-size BATCH_SIZE] [--required-batch-size-multiple REQUIRED_BATCH_SIZE_MULTIPLE]
                     [--required-seq-len-multiple REQUIRED_SEQ_LEN_MULTIPLE] [--dataset-impl {raw,lazy,cached,mmap,fasta,huffman}] [--data-buffer-size DATA_BUFFER_SIZE] [--train-subset TRAIN_SUBSET]
                     [--valid-subset VALID_SUBSET] [--combine-valid-subsets] [--ignore-unused-valid-subsets] [--validate-interval VALIDATE_INTERVAL] [--validate-interval-updates VALIDATE_INTERVAL_UPDATES]
                     [--validate-after-updates VALIDATE_AFTER_UPDATES] [--fixed-validation-seed FIXED_VALIDATION_SEED] [--disable-validation] [--max-tokens-valid MAX_TOKENS_VALID]
                     [--batch-size-valid BATCH_SIZE_VALID] [--max-valid-steps MAX_VALID_STEPS] [--curriculum CURRICULUM] [--gen-subset GEN_SUBSET] [--num-shards NUM_SHARDS] [--shard-id SHARD_ID]
                     [--grouped-shuffling] [--update-epoch-batch-itr UPDATE_EPOCH_BATCH_ITR] [--update-ordered-indices-seed] [--distributed-world-size DISTRIBUTED_WORLD_SIZE]
                     [--distributed-num-procs DISTRIBUTED_NUM_PROCS] [--distributed-rank DISTRIBUTED_RANK] [--distributed-backend DISTRIBUTED_BACKEND] [--distributed-init-method DISTRIBUTED_INIT_METHOD]
                     [--distributed-port DISTRIBUTED_PORT] [--device-id DEVICE_ID] [--distributed-no-spawn] [--ddp-backend {c10d,fully_sharded,legacy_ddp,no_c10d,pytorch_ddp,slowmo}] [--ddp-comm-hook {none,fp16}]
                     [--bucket-cap-mb BUCKET_CAP_MB] [--fix-batches-to-gpus] [--find-unused-parameters] [--gradient-as-bucket-view] [--fast-stat-sync] [--heartbeat-timeout HEARTBEAT_TIMEOUT] [--broadcast-buffers]
                     [--slowmo-momentum SLOWMO_MOMENTUM] [--slowmo-base-algorithm SLOWMO_BASE_ALGORITHM] [--localsgd-frequency LOCALSGD_FREQUENCY] [--nprocs-per-node NPROCS_PER_NODE] [--pipeline-model-parallel]
                     [--pipeline-balance PIPELINE_BALANCE] [--pipeline-devices PIPELINE_DEVICES] [--pipeline-chunks PIPELINE_CHUNKS] [--pipeline-encoder-balance PIPELINE_ENCODER_BALANCE]
                     [--pipeline-encoder-devices PIPELINE_ENCODER_DEVICES] [--pipeline-decoder-balance PIPELINE_DECODER_BALANCE] [--pipeline-decoder-devices PIPELINE_DECODER_DEVICES]
                     [--pipeline-checkpoint {always,never,except_last}] [--zero-sharding {none,os}] [--no-reshard-after-forward] [--fp32-reduce-scatter] [--cpu-offload] [--use-sharded-state]
                     [--not-fsdp-flatten-parameters] [--path PATH] [--post-process [POST_PROCESS]] [--quiet] [--model-overrides MODEL_OVERRIDES] [--results-path RESULTS_PATH] [--beam BEAM] [--nbest NBEST]
                     [--max-len-a MAX_LEN_A] [--max-len-b MAX_LEN_B] [--min-len MIN_LEN] [--match-source-len] [--unnormalized] [--no-early-stop] [--no-beamable-mm] [--lenpen LENPEN] [--unkpen UNKPEN]
                     [--replace-unk [REPLACE_UNK]] [--sacrebleu] [--score-reference] [--prefix-size PREFIX_SIZE] [--no-repeat-ngram-size NO_REPEAT_NGRAM_SIZE] [--sampling] [--sampling-topk SAMPLING_TOPK]
                     [--sampling-topp SAMPLING_TOPP] [--constraints [{ordered,unordered}]] [--temperature TEMPERATURE] [--diverse-beam-groups DIVERSE_BEAM_GROUPS] [--diverse-beam-strength DIVERSE_BEAM_STRENGTH]
                     [--diversity-rate DIVERSITY_RATE] [--print-alignment [{hard,soft}]] [--print-step] [--lm-path LM_PATH] [--lm-weight LM_WEIGHT] [--iter-decode-eos-penalty ITER_DECODE_EOS_PENALTY]
                     [--iter-decode-max-iter ITER_DECODE_MAX_ITER] [--iter-decode-force-max-iter] [--iter-decode-with-beam ITER_DECODE_WITH_BEAM] [--iter-decode-with-external-reranker] [--retain-iter-history]
                     [--retain-dropout] [--retain-dropout-modules RETAIN_DROPOUT_MODULES] [--decoding-format {unigram,ensemble,vote,dp,bs}] [--no-seed-provided] [--save-dir SAVE_DIR] [--restore-file RESTORE_FILE]
                     [--continue-once CONTINUE_ONCE] [--finetune-from-model FINETUNE_FROM_MODEL] [--reset-dataloader] [--reset-lr-scheduler] [--reset-meters] [--reset-optimizer]
                     [--optimizer-overrides OPTIMIZER_OVERRIDES] [--save-interval SAVE_INTERVAL] [--save-interval-updates SAVE_INTERVAL_UPDATES] [--keep-interval-updates KEEP_INTERVAL_UPDATES]
                     [--keep-interval-updates-pattern KEEP_INTERVAL_UPDATES_PATTERN] [--keep-last-epochs KEEP_LAST_EPOCHS] [--keep-best-checkpoints KEEP_BEST_CHECKPOINTS] [--no-save] [--no-epoch-checkpoints]
                     [--no-last-checkpoints] [--no-save-optimizer-state] [--best-checkpoint-metric BEST_CHECKPOINT_METRIC] [--maximize-best-checkpoint-metric] [--patience PATIENCE]
                     [--checkpoint-suffix CHECKPOINT_SUFFIX] [--checkpoint-shard-count CHECKPOINT_SHARD_COUNT] [--load-checkpoint-on-all-dp-ranks] [--write-checkpoints-asynchronously] [--buffer-size BUFFER_SIZE]
                     [--input INPUT] [--source-lang SOURCE_LANG] [--target-lang TARGET_LANG] [--load-alignments] [--left-pad-source] [--left-pad-target] [--max-source-positions MAX_SOURCE_POSITIONS]
                     [--max-target-positions MAX_TARGET_POSITIONS] [--upsample-primary UPSAMPLE_PRIMARY] [--truncate-source] [--num-batch-buckets NUM_BATCH_BUCKETS] [--eval-bleu] [--eval-bleu-args EVAL_BLEU_ARGS]
                     [--eval-bleu-detok EVAL_BLEU_DETOK] [--eval-bleu-detok-args EVAL_BLEU_DETOK_ARGS] [--eval-tokenized-bleu] [--eval-bleu-remove-bpe [EVAL_BLEU_REMOVE_BPE]] [--eval-bleu-print-samples]
                     [--force-anneal FORCE_ANNEAL] [--lr-shrink LR_SHRINK] [--warmup-updates WARMUP_UPDATES] [--pad PAD] [--eos EOS] [--unk UNK]
                     data
gradio_app.py: error: unrecognized arguments: --local-rank=0 
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 648457) of binary: /home/wendell/anaconda3/envs/kosmos-2/bin/python
Traceback (most recent call last):
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 196, in <module>
    main()
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 192, in main
    launch(args)
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 177, in launch
    run(args)
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
demo/gradio_app.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-10-14_19:11:23
  host      : DESKTOP-3Q0HFJ3.
  rank      : 0 (local_rank: 0)
  exitcode  : 2 (pid: 648457)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
run_gradio.sh: line 8: --task: command not found
run_gradio.sh: line 9: --path: command not found
run_gradio.sh: line 11: --model-overrides: command not found
run_gradio.sh: line 12: --dict-path: command not found
run_gradio.sh: line 13: --required-batch-size-multiple: command not found
run_gradio.sh: line 14: --remove-bpe=sentencepiece: command not found
run_gradio.sh: line 15: --max-len-b: command not found
run_gradio.sh: line 16: --add-bos-token: command not found
run_gradio.sh: line 17: --beam: command not found
run_gradio.sh: line 18: --buffer-size: command not found
run_gradio.sh: line 19: --image-feature-length: command not found
run_gradio.sh: line 20: --locate-special-token: command not found
run_gradio.sh: line 21: --batch-size: command not found
run_gradio.sh: line 22: --nbest: command not found
run_gradio.sh: line 23: --no-repeat-ngram-size: command not found
run_gradio.sh: line 24: --location-bin-size: command not found

run_gradio.sh

#!/bin/bash

model_path=./path/kosmos2.pt

master_port=$((RANDOM%1000+20000))

CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port=$master_port --nproc_per_node=1 demo/gradio_app.py None \
    --task generation_obj \
    --path $model_path \
    --model-overrides "{'visual_pretrained': '',
            'dict_path':'data/dict.txt'}" \
    --dict-path 'data/dict.txt' \
    --required-batch-size-multiple 1 \
    --remove-bpe=sentencepiece \
    --max-len-b 500 \
    --add-bos-token \
    --beam 1 \
    --buffer-size 1 \
    --image-feature-length 64 \
    --locate-special-token 1 \
    --batch-size 1 \
    --nbest 1 \
    --no-repeat-ngram-size 3 \
    --location-bin-size 32

Package Version

------------------------- -------------------------
aiofiles                  23.2.1
aiohttp                   3.8.6
aiosignal                 1.3.1
altair                    5.1.2
annotated-types           0.6.0
antlr4-python3-runtime    4.8
anyio                     3.7.1
apex                      0.1
async-timeout             4.0.3
attrs                     23.1.0
bitarray                  2.8.2
blis                      0.7.11
braceexpand               0.1.7
catalogue                 2.0.10
certifi                   2023.7.22
cffi                      1.16.0
charset-normalizer        3.3.0
click                     8.1.7
colorama                  0.4.6
confection                0.1.3
contourpy                 1.1.1
cycler                    0.12.1
cymem                     2.0.8
Cython                    3.0.3
deepspeed                 0.4.4+165739a5
exceptiongroup            1.1.3
fairscale                 0.4.0
fairseq                   1.0.0a0+b237f42
fastapi                   0.103.2
ffmpy                     0.3.1
filelock                  3.12.4
fonttools                 4.43.1
frozenlist                1.4.0
fsspec                    2023.9.2
ftfy                      6.1.1
gmpy2                     2.1.2
gradio                    3.37.0
gradio_client             0.6.0
h11                       0.14.0
httpcore                  0.17.3
httpx                     0.24.1
huggingface-hub           0.18.0
hydra-core                1.0.7
idna                      3.4
importlib-resources       6.1.0
infinibatch               0.1.0
Jinja2                    3.1.2
jsonschema                4.19.1
jsonschema-specifications 2023.7.1
kiwisolver                1.4.5
langcodes                 3.3.0
linkify-it-py             2.0.2
lxml                      4.9.3
markdown-it-py            2.2.0
MarkupSafe                2.1.1
matplotlib                3.8.0
mdit-py-plugins           0.3.3
mdurl                     0.1.2
mpmath                    1.3.0
multidict                 6.0.4
murmurhash                1.0.10
networkx                  3.1
ninja                     1.11.1.1
numpy                     1.23.0
nvidia-cublas-cu11        11.10.3.66
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cudnn-cu11         8.5.0.96
omegaconf                 2.0.6
open-clip-torch           1.3.0
opencv-python-headless    4.8.0.74
orjson                    3.9.9
packaging                 23.2
pandas                    2.1.1
pathy                     0.10.2
Pillow                    10.0.1
pip                       23.2.1
portalocker               2.8.2
preshed                   3.0.9
protobuf                  3.20.3
psutil                    5.9.5
pycparser                 2.21
pydantic                  1.10.11
pydantic_core             2.10.1
pydub                     0.25.1
pyparsing                 3.1.1
python-dateutil           2.8.2
python-multipart          0.0.6
pytz                      2023.3.post1
PyYAML                    6.0.1
referencing               0.30.2
regex                     2023.10.3
requests                  2.31.0
rpds-py                   0.10.6
sacrebleu                 2.3.1
scipy                     1.8.0
semantic-version          2.10.0
sentencepiece             0.1.99
setuptools                68.0.0
six                       1.16.0
smart-open                6.4.0
sniffio                   1.3.0
spacy                     3.6.0
spacy-legacy              3.0.12
spacy-loggers             1.0.5
srsly                     2.4.8
starlette                 0.27.0
sympy                     1.11.1
tabulate                  0.9.0
tensorboardX              1.8
thinc                     8.1.10
tiktoken                  0.5.1
timm                      0.4.12
toolz                     0.12.0
torch                     1.13.0
torchscale                0.1.1
torchvision               0.14.0
tqdm                      4.66.1
triton                    2.0.0
typer                     0.9.0
typing_extensions         4.7.1
tzdata                    2023.3
uc-micro-py               1.0.2
urllib3                   2.0.6
uvicorn                   0.23.2
wasabi                    1.1.2
wcwidth                   0.2.8
webdataset                0.2.57
websockets                11.0.3
wheel                     0.41.2
xformers                  0.0.23.dev652+git.705810f
yarl                      1.9.2
zipp                      3.17.0

I've encountered many difficulties in setting up the environment, and after ensuring everything is correctly configured, I'm still getting errors when running the run_gradio.sh. I hope to receive assistance. Thank you!

##################### # # Use this with or without the .gitattributes snippet with this Gist # create a fixle.sh file, paste this in and run it. # Why do you want this ? Because Git will see diffs between files shared between Linux and Windows due to differences in line ending handling ( Windows uses CRLF and Unix LF) # This Gist normalizes handling by forcing everything to use Unix style. ##################### # Fix Line Endings - Force All Line Endings to LF and Not Windows Default CR or CRLF # Taken largely from: https://help.github.com/articles/dealing-with-line-endings/ # With the exception that we are forcing LF instead of converting to windows style. #Set LF as your line ending default. git config --global core.eol lf #Set autocrlf to false to stop converting between windows style (CRLF) and Unix style (LF) git config --global core.autocrlf false #Save your current files in Git, so that none of your work is lost. git add . -u git commit -m "Saving files before refreshing line endings" #Remove the index and force Git to rescan the working directory. rm .git/index #Rewrite the Git index to pick up all the new line endings. git reset #Show the rewritten, normalized files. git status #Add all your changed files back, and prepare them for a commit. This is your chance to inspect which files, if any, were unchanged. git add -u # It is perfectly safe to see a lot of messages here that read # "warning: CRLF will be replaced by LF in file." #Rewrite the .gitattributes file. git add .gitattributes #Commit the changes to your repository. git commit -m "Normalize all the line endings"

[master 45d484f] Saving files before refreshing line endings 1 file changed, 40 insertions(+) create mode 100644 fixle.sh On branch master Your branch is ahead of 'origin/master' by 1 commit. (use "git push" to publish your local commits) nothing to commit, working tree clean fatal: pathspec '.gitattributes' did not match any files On branch master Your branch is ahead of 'origin/master' by 1 commit. (use "git push" to publish your local commits) nothing to commit, working tree clean

FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( 2023-10-16 15:06:08 | WARNING | xformers | WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.0.1 with CUDA 1108 (you have 1.13.0+cu117) Python 3.9.18 (you have 3.9.18) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details Please install pip install -r visual_requirement.txt for VL dataset 2023-10-16 15:06:10 | INFO | fairseq.distributed.utils | distributed init (rank 0): env:// 2023-10-16 15:06:10 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:1 to store for rank: 0 2023-10-16 15:06:10 | INFO | torch.distributed.distributed_c10d | Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2023-10-16 15:06:10 | INFO | fairseq.distributed.utils | initialized host DESKTOP-3Q0HFJ3 as rank 0 2023-10-16 15:06:11 | INFO | fairseq_cli.interactive | {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': None, 'log_file': None, 'tensorboard_logdir': None, 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_bf16': False, 'fp16': False, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_convert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': None, 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile': False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma', 'deepspeed': False, 'zero': 0, 'exit_interval': 0}, 'common_eval': {'_name': None, 'path': '/path/kosmos2.pt', 'post_process': 'sentencepiece', 'quiet': False, 'model_overrides': "{'visual_pretrained': '',\n 'dict_path':'data/dict.txt'}", 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_num_procs': 1, 'distributed_rank': 0, 'distributed_backend': 'nccl', 'distributed_init_method': 'env://', 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': True, 'ddp_backend': 'pytorch_ddp', 'ddp_comm_hook': 'none', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': False, 'gradient_as_bucket_view': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadcast_buffers': False, 'slowmo_momentum': None, 'slowmo_base_algorithm': 'localsgd', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance': None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devices': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': False, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scatter': False, 'cpu_offload': False, 'use_sharded_state': False, 'not_fsdp_flatten_parameters': False}, 'dataset': {'_name': None, 'num_workers': 1, 'skip_invalid_size_inputs_valid_test': False, 'max_tokens': None, 'batch_size': 1, 'required_batch_size_multiple': 1, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset': 'train', 'valid_subset': 'valid', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1, 'validate_interval_updates': 0, 'validate_after_updates': 0, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': None, 'batch_size_valid': 1, 'max_valid_steps': None, 'curriculum': 0, 'gen_subset': 'test', 'num_shards': 1, 'shard_id': 0, 'grouped_shuffling': False, 'update_epoch_batch_itr': False, 'update_ordered_indices_seed': False}, 'optimization': {'_name': None, 'max_epoch': 0, 'max_update': 0, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': False, 'update_freq': [1], 'lr': [0.25], 'stop_min_lr': -1.0, 'use_bmuf': False, 'skip_remainder_batch': False}, 'checkpoint': {'_name': None, 'save_dir': 'checkpoints', 'restore_file': 'checkpoint_last.pt', 'continue_once': None, 'finetune_from_model': None, 'reset_dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 1, 'save_interval_updates': 0, 'keep_interval_updates': -1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': False, 'no_last_checkpoints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'loss', 'maximize_best_checkpoint_metric': False, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_shard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_momentum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 1, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 500, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 3, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'output_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 1, 'input': '-'}, 'model': None, 'task': {'_name': 'generation_obj', 'data': 'None', 'sample_break_mode': 'none', 'tokens_per_sample': 1024, 'output_dictionary_size': -1, 'self_target': False, 'future_target': False, 'past_target': False, 'add_bos_token': True, 'max_target_positions': None, 'shorten_method': 'none', 'shorten_data_split_list': '', 'pad_to_fixed_length': False, 'pad_to_fixed_bsz': False, 'seed': 1, 'batch_size': 1, 'batch_size_valid': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'tpu': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma', 'required_batch_size_multiple': 1, 'dict_path': 'data/dict.txt', 'image_feature_length': 64, 'input_resolution': 224, 'location_bin_size': 32, 'locate_special_token': 1}, 'criterion': {'_name': 'cross_entropy', 'sentence_avg': True}, 'optimizer': None, 'lr_scheduler': {'_name': 'fixed', 'force_anneal': None, 'lr_shrink': 0.1, 'warmup_updates': 0, 'lr': [0.25]}, 'scoring': {'_name': 'bleu', 'pad': 1, 'eos': 2, 'unk': 3}, 'bpe': None, 'tokenizer': None, 'ema': {'_name': None, 'store_ema': False, 'ema_decay': 0.9999, 'ema_start_update': 0, 'ema_seed_model': None, 'ema_update_freq': 1, 'ema_fp32': False}} 2023-10-16 15:06:11 | INFO | fairseq_cli.interactive | Task: {'_name': 'generation_obj', 'data': 'None', 'sample_break_mode': 'none', 'tokens_per_sample': 1024, 'output_dictionary_size': -1, 'self_target': False, 'future_target': False, 'past_target': False, 'add_bos_token': True, 'max_target_positions': None, 'shorten_method': 'none', 'shorten_data_split_list': '', 'pad_to_fixed_length': False, 'pad_to_fixed_bsz': False, 'seed': 1, 'batch_size': 1, 'batch_size_valid': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'tpu': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma', 'required_batch_size_multiple': 1, 'dict_path': 'data/dict.txt', 'image_feature_length': 64, 'input_resolution': 224, 'location_bin_size': 32, 'locate_special_token': 1} 2023-10-16 15:06:11 | INFO | unilm.tasks.generation_obj | dictionary from data/dict.txt: 65037 types 2023-10-16 15:06:11 | INFO | fairseq_cli.interactive | loading model(s) from /path/kosmos2.pt Traceback (most recent call last): File "/home/wendell/unilm/kosmos-2/demo/gradio_app.py", line 611, in <module> cli_main() File "/home/wendell/unilm/kosmos-2/demo/gradio_app.py", line 607, in cli_main distributed_utils.call_main(convert_namespace_to_omegaconf(args), main) File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/fairseq/distributed/utils.py", line 359, in call_main distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/fairseq/distributed/utils.py", line 333, in distributed_main main(cfg, **kwargs) File "/home/wendell/unilm/kosmos-2/demo/gradio_app.py", line 265, in main models, _model_args = checkpoint_utils.load_model_ensemble( File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/fairseq/checkpoint_utils.py", line 385, in load_model_ensemble ensemble, args, _task = load_model_ensemble_and_task( File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/fairseq/checkpoint_utils.py", line 441, in load_model_ensemble_and_task raise IOError("Model file not found: {}".format(filename)) OSError: Model file not found: /path/kosmos2.pt ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 678564) of binary: /home/wendell/anaconda3/envs/kosmos-2/bin/python Traceback (most recent call last): File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 195, in <module> main() File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 191, in main launch(args) File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launch.py", line 176, in launch run(args) File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/wendell/anaconda3/envs/kosmos-2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ demo/gradio_app.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-10-16_15:06:12 host : DESKTOP-3Q0HFJ3. rank : 0 (local_rank: 0) exitcode : 1 (pid: 678564) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

(kosmos) wendell@DESKTOP-3Q0HFJ3:~/unilm/kosmos-2$ bash run_gradio.sh /home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( Traceback (most recent call last): File "/home/wendell/unilm/kosmos-2/demo/gradio_app.py", line 12, in <module> import unilm File "/home/wendell/unilm/kosmos-2/./unilm/__init__.py", line 1, in <module> import unilm.models File "/home/wendell/unilm/kosmos-2/./unilm/models/__init__.py", line 6, in <module> import_models(models_dir, "unilm.models") File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/fairseq/models/__init__.py", line 217, in import_models importlib.import_module(namespace + "." + model_name) File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/wendell/unilm/kosmos-2/./unilm/models/gpt_eval.py", line 39, in <module> from torchscale.architecture.decoder import Decoder File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torchscale/architecture/decoder.py", line 12, in <module> from torchscale.architecture.utils import init_bert_params File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torchscale/architecture/utils.py", line 6, in <module> from torchscale.component.multihead_attention import MultiheadAttention File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torchscale/component/multihead_attention.py", line 12, in <module> from xformers.ops import memory_efficient_attention, LowerTriangularMask, MemoryEfficientAttentionCutlassOp ModuleNotFoundError: No module named 'xformers' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 445686) of binary: /home/wendell/anaconda3/envs/kosmos/bin/python Traceback (most recent call last): File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 195, in <module> main() File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 191, in main launch(args) File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 176, in launch run(args) File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ demo/gradio_app.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-10-23_22:58:01 host : DESKTOP-3Q0HFJ3. rank : 0 (local_rank: 0) exitcode : 1 (pid: 445686) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================

/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( Traceback (most recent call last): File "/home/wendell/unilm/kosmos-2/demo/gradio_app.py", line 12, in <module> import unilm File "/home/wendell/unilm/kosmos-2/./unilm/__init__.py", line 3, in <module> import unilm.tasks File "/home/wendell/unilm/kosmos-2/./unilm/tasks/__init__.py", line 7, in <module> import_tasks(tasks_dir, "unilm.tasks") File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/fairseq/tasks/__init__.py", line 117, in import_tasks importlib.import_module(namespace + "." + task_name) File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/wendell/unilm/kosmos-2/./unilm/tasks/generation_obj.py", line 33, in <module> from unilm.data.utils import SPECIAL_SYMBOLS, add_location_symbols File "/home/wendell/unilm/kosmos-2/./unilm/data/utils.py", line 8, in <module> from infinibatch import iterators ImportError: cannot import name 'iterators' from 'infinibatch' (unknown location) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 648380) of binary: /home/wendell/anaconda3/envs/kosmos/bin/python Traceback (most recent call last): File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 195, in <module> main() File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 191, in main launch(args) File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launch.py", line 176, in launch run(args) File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/wendell/anaconda3/envs/kosmos/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ demo/gradio_app.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-10-24_08:00:09 host : DESKTOP-3Q0HFJ3. rank : 0 (local_rank: 0) exitcode : 1 (pid: 648380) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================

microsoft / unilm

[Kosmos-2] Unable to start the demo #1333