Open gdnyfcuso opened 7 months ago
Python 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0] on linux
`(sagent) ➜ ps@ps ~/data2T/screenagent/ScreenAgent-main/ScreenAgent-main/train/saved_models/ScreenAgent-2312 pip list
Package Version
accelerate 0.29.2 aiohttp 3.9.3 aiosignal 1.3.1 altair 5.3.0 annotated-types 0.6.0 anyio 4.3.0 async-timeout 4.0.3 attrs 23.2.0 blinker 1.7.0 blis 0.7.11 boto3 1.34.81 botocore 1.34.81 braceexpand 0.1.7 cachetools 5.3.3 catalogue 2.0.10 certifi 2024.2.2 charset-normalizer 3.3.2 click 8.1.7 cloudpathlib 0.16.0 confection 0.1.4 contourpy 1.2.1 cpm-kernels 1.0.11 cycler 0.12.1 cymem 2.0.8 datasets 2.18.0 deepspeed 0.14.0 dill 0.3.8 einops 0.7.0 exceptiongroup 1.2.0 fastapi 0.110.1 filelock 3.13.4 fonttools 4.51.0 frozenlist 1.4.1 fsspec 2024.2.0 gitdb 4.0.11 GitPython 3.1.43 h11 0.14.0 hjson 3.1.0 huggingface-hub 0.22.2 idna 3.6 Jinja2 3.1.3 jmespath 1.0.1 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 langcodes 3.3.0 loguru 0.7.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.8.4 mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 murmurhash 1.0.10 networkx 3.3 ninja 1.11.1.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.19.3 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.1.105 packaging 24.0 pandas 2.2.1 pillow 10.3.0 pip 23.3.1 preshed 3.0.9 protobuf 4.25.3 psutil 5.9.8 py-cpuinfo 9.0.0 pyarrow 15.0.2 pyarrow-hotfix 0.6 pydantic 2.6.4 pydantic_core 2.16.3 pydeck 0.8.1b0 Pygments 2.17.2 pynvml 11.5.0 pyparsing 3.1.2 python-dateutil 2.9.0.post0 pytz 2024.1 PyYAML 6.0.1 referencing 0.34.0 regex 2023.12.25 requests 2.31.0 rich 13.7.1 rpds-py 0.18.0 s3transfer 0.10.1 safetensors 0.4.2 seaborn 0.13.2 sentencepiece 0.2.0 setuptools 68.2.2 six 1.16.0 smart-open 6.4.0 smmap 5.0.1 sniffio 1.3.1 spacy 3.7.4 spacy-legacy 3.0.12 spacy-loggers 1.0.5 srsly 2.4.8 starlette 0.37.2 streamlit 1.33.0 SwissArmyTransformer 0.4.11 sympy 1.12 tenacity 8.2.3 tensorboardX 2.6.2.2 thinc 8.2.3 timm 0.9.16 tokenizers 0.15.2 toml 0.10.2 toolz 0.12.1 torch 2.2.2 torchvision 0.17.2 tornado 6.4 tqdm 4.66.2 transformers 4.39.3 triton 2.2.0 typer 0.9.4 typing_extensions 4.11.0 tzdata 2024.1 urllib3 2.2.1 uvicorn 0.29.0 wasabi 1.1.2 watchdog 4.0.0 weasel 0.3.4 webdataset 0.2.86 wheel 0.41.2 xformers 0.0.25.post1 xxhash 3.4.1 yarl 1.9.4`
RANK=0 WORLD_SIZE=1 LOCAL_RANK=0 python cogagent_model_worker.py --host 0.0.0.0 --port 40000 --from_pretrained "saved_models/ScreenAgent-2312" --bf16 --max_length 2048 [2024-04-10 13:38:43,071] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) Please 'pip install apex' [2024-04-10 13:38:45,030] [WARNING] Failed to load bitsandbytes:No module named 'bitsandbytes' 2024-04-10 13:38:45 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, max_length=2048, top_p=0.4, top_k=1, temperature=0.8, chinese=False, version='chat', quant=None, stream_chat=False, from_pretrained='saved_models/ScreenAgent-2312', local_tokenizer='lmsys/vicuna-7b-v1.5', fp16=False, bf16=True) [2024-04-10 13:38:45,039] [INFO] building FineTuneTrainCogAgentModel model ... 2024-04-10 13:38:45 | INFO | sat | building FineTuneTrainCogAgentModel model ... [2024-04-10 13:38:45,042] [INFO] [RANK 0] > initializing model parallel with size 1 2024-04-10 13:38:45 | INFO | sat | [RANK 0] > initializing model parallel with size 1 [2024-04-10 13:38:45,042] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually. 2024-04-10 13:38:45 | INFO | sat | [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually. [2024-04-10 13:38:45,043] [INFO] [RANK 0] You are using model-only mode. For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK. 2024-04-10 13:38:45 | INFO | sat | [RANK 0] You are using model-only mode. For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK. 2024-04-10 13:39:14 | INFO | root | Shape of rope freq: torch.Size([6400, 64]) [2024-04-10 13:39:19,525] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 18291033280 2024-04-10 13:39:19 | INFO | sat | [RANK 0] > number of parameters on model parallel rank 0: 18291033280 [2024-04-10 13:39:27,684] [INFO] [RANK 0] global rank 0 is loading checkpoint saved_models/ScreenAgent-2312/1/mp_rank_00_model_states.pt 2024-04-10 13:39:27 | INFO | sat | [RANK 0] global rank 0 is loading checkpoint saved_models/ScreenAgent-2312/1/mp_rank_00_model_states.pt [2024-04-10 13:40:04,099] [INFO] [RANK 0] > successfully loaded saved_models/ScreenAgent-2312/1/mp_rank_00_model_states.pt 2024-04-10 13:40:04 | INFO | sat | [RANK 0] > successfully loaded saved_models/ScreenAgent-2312/1/mp_rank_00_model_states.pt 2024-04-10 13:40:06 | ERROR | stderr | Traceback (most recent call last): 2024-04-10 13:40:06 | ERROR | stderr | File "/home/kaleo/data2T/screenagent/ScreenAgent-main/ScreenAgent-main/train/cogagent_model_worker.py", line 145, in
2024-04-10 13:40:06 | ERROR | stderr | worker = ModelWorker(args)
2024-04-10 13:40:06 | ERROR | stderr | File "/home/kaleo/data2T/screenagent/ScreenAgent-main/ScreenAgent-main/train/cogagent_model_worker.py", line 58, in init
2024-04-10 13:40:06 | ERROR | stderr | self.tokenizer = llama2_tokenizer(args.local_tokenizer, signal_type=args.version)
2024-04-10 13:40:06 | ERROR | stderr | File "/home/kaleo/data2T/screenagent/ScreenAgent-main/ScreenAgent-main/train/utils/language.py", line 42, in llama2_tokenizer
2024-04-10 13:40:06 | ERROR | stderr | tokenizer = LlamaTokenizer.from_pretrained(tokenizer_path)
2024-04-10 13:40:06 | ERROR | stderr | File "/home/kaleo/anaconda3/envs/sagent/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2086, in from_pretrained
2024-04-10 13:40:06 | ERROR | stderr | return cls._from_pretrained(
2024-04-10 13:40:06 | ERROR | stderr | File "/home/kaleo/anaconda3/envs/sagent/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2325, in _from_pretrained
2024-04-10 13:40:06 | ERROR | stderr | tokenizer = cls(*init_inputs, **init_kwargs)
2024-04-10 13:40:06 | ERROR | stderr | File "/home/kaleo/anaconda3/envs/sagent/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 182, in init
2024-04-10 13:40:06 | ERROR | stderr | self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
2024-04-10 13:40:06 | ERROR | stderr | File "/home/kaleo/anaconda3/envs/sagent/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 215, in get_spm_processor
2024-04-10 13:40:06 | ERROR | stderr | model = model_pb2.ModelProto.FromString(sp_model)
2024-04-10 13:40:06 | ERROR | stderr | google.protobuf.message.DecodeError: Error parsing message