Not sure if this is helpful but I was only able to get Triton's flash attention to work on an A100. I tried H100, A10, A6000... & nope.
I gave up on the tests and went for a training, it worked but after the training I have a traceback error that seems to originate from this line in the file
File "/workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/torch/distributed/fsdp/", line 1637, in _all_gather_optim_state for name, non_tensor_value in object_state.non_tensors.items(): AttributeError: 'int' object has no attribute 'items'
This line is trying to iterate over the items of object_state.non_tensors, but it's encountering an AttributeError because object_state.non_tensors is an integer, while integers won't have an items method.
Any ideas, CUDA is 11.7, Pytorch is 11.7
this is the full env: aiohttp==3.8.4 aiosignal==1.3.1 antlr4-python3-runtime==4.9.3 apache-libcloud==3.7.0 appdirs==1.4.4 argcomplete==3.0.8 arrow==1.2.3 async-timeout==4.0.2 attrs==23.1.0 backoff==2.2.1 bcrypt==4.0.1 boto3==1.26.142 botocore==1.29.142 Brotli==1.0.9 certifi==2023.5.7 cffi==1.15.1 charset-normalizer==3.1.0 circuitbreaker==1.4.0 click==8.1.3 cmake==3.26.3 coloredlogs==15.0.1 composer==0.14.1 contourpy==1.0.7 coolname==2.2.0 cryptography==39.0.2 cycler==0.11.0 datasets==2.10.1 decorator==5.1.1 dill==0.3.6 docker==6.1.2 docker-pycreds==0.4.0 einops==0.5.0 exceptiongroup==1.1.1 filelock==3.12.0 flash-attn==1.0.3.post0 flatbuffers==23.5.26 fonttools==4.39.4 frozenlist==1.3.3 fsspec==2023.5.0 gitdb==4.0.10 GitPython==3.1.31 gql==3.4.1 graphql-core==3.2.3 huggingface-hub==0.14.1 humanfriendly==10.0 idna==3.4 importlib-metadata==6.6.0 iniconfig==2.0.0 Jinja2==3.1.2 jmespath==1.0.1 kiwisolver==1.4.4 lit==16.0.5 llm-foundry==0.1.0 markdown-it-py==2.2.0 MarkupSafe==2.1.2 matplotlib==3.7.1 mdurl==0.1.2 mosaicml-cli==0.4.4 mosaicml-streaming==0.4.1 mpmath==1.3.0 multidict==6.0.4 multiprocess==0.70.14 networkx==3.1 numpy==1.24.3 nvidia-cublas-cu11== nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11== nvidia-cufft-cu11== nvidia-curand-cu11== nvidia-cusolver-cu11== nvidia-cusparse-cu11== nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 oci==2.103.0 omegaconf==2.3.0 onnx==1.13.1 onnxruntime==1.14.1 packaging==22.0 pandas==2.0.1 paramiko==3.2.0 pathtools==0.1.2 Pillow==9.5.0 pluggy==1.0.0 prompt-toolkit==3.0.38 protobuf==3.20.3 psutil==5.9.5 py-cpuinfo==9.0.0 pyarrow==12.0.0 pycparser==2.21 Pygments==2.15.1 PyNaCl==1.5.0 pyOpenSSL==23.1.1 pyparsing==3.0.9 pytest==7.3.1 python-dateutil==2.8.2 python-snappy==0.6.1 pytorch-ranger==0.1.1 pytz==2023.3 PyYAML==6.0 questionary==1.10.0 regex==2023.5.5 requests==2.31.0 responses==0.18.0 rich==13.3.5 ruamel.yaml==0.17.28 ruamel.yaml.clib==0.2.7 s3transfer==0.6.1 scipy==1.10.1 sentencepiece==0.1.97 sentry-sdk==1.24.0 setproctitle==1.3.2 six==1.16.0 slack-sdk==3.21.3 smmap==5.0.0 sympy==1.12 tabulate==0.9.0 tokenizers==0.13.3 tomli==2.0.1 torch==2.0.1 torch-optimizer==0.3.0 torchdata==0.6.1 torchmetrics==0.11.3 torchtext==0.15.2 torchvision==0.15.2 tqdm==4.65.0 transformers==4.28.1 triton==2.0.0 triton-pre-mlir @ git+ typing_extensions==4.6.2 tzdata==2023.3 urllib3==1.26.16 validators==0.20.0 wandb==0.15.3 wcwidth==0.2.6 websocket-client==1.5.2 websockets==10.4 xentropy-cuda-lib @ git+ xxhash==3.2.0 yarl==1.9.2 zipp==3.15.0 zstd==
ANd this is the run and the final error, I am running ion 8bits, but still tweaking the other values:
SystemExit: 143
wandb: Run history:
wandb: loss/train/total ███▆▄▃▃▂▂▁
wandb: lr-DecoupledAdamW/group0 ▁▂▃▃▄▅▆▆▇█
wandb: memory/active_mem ▁█████████
wandb: memory/alloc_retries ▁▁▁▁▁▁▁▁▁▁
wandb: memory/allocated_mem ▁█████████
wandb: memory/inactive_mem ▁█████████
wandb: memory/reserved_mem ▁▁▁▁▁▁▁▁▁▁
wandb: metrics/train/LanguageCrossEntropy ███▆▄▃▃▂▂▁
wandb: metrics/train/LanguagePerplexity ███▄▃▂▂▂▁▁
wandb: time/batch ▁▂▃▃▄▅▆▆▇█
wandb: time/batch_in_epoch ▁▂▃▃▄▅▆▆▇█
wandb: time/epoch ▁
wandb: time/remaining_estimate █▇▆▅▅▄▃▂▁
wandb: time/sample ▁▂▃▃▄▅▆▆▇█
wandb: time/sample_in_epoch ▁▂▃▃▄▅▆▆▇█
wandb: time/token ▁▂▃▃▄▅▆▆▇█
wandb: time/token_in_epoch ▁▂▃▃▄▅▆▆▇█
wandb: time/total ▁▂▃▃▄▅▆▆▇█
wandb: time/train ▁▂▃▃▄▅▆▆▇█
wandb: time/val ▁▁▁▁▁▁▁▁▁▁
wandb: trainer/device_train_microbatch_size ▁▁▁▁▁▁▁▁▁▁
wandb: Run summary:
wandb: loss/train/total 9.69673
wandb: lr-DecoupledAdamW/group0 5e-05
wandb: memory/active_mem 1.9783
wandb: memory/alloc_retries 0
wandb: memory/allocated_mem 1.9783
wandb: memory/inactive_mem 1.0269
wandb: memory/reserved_mem 8.5732
wandb: metrics/train/LanguageCrossEntropy 9.69674
wandb: metrics/train/LanguagePerplexity 16264.45312
wandb: time/batch 9
wandb: time/batch_in_epoch 9
wandb: time/epoch 0
wandb: time/remaining_estimate 0.0
wandb: time/sample 2304
wandb: time/sample_in_epoch 2304
wandb: time/token 2359296
wandb: time/token_in_epoch 2359296
wandb: time/total 0.03553
wandb: time/train 0.03553
wandb: time/val 0.0
wandb: trainer/device_train_microbatch_size 8
wandb: 🚀 View run llm at:
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230529_213452-f9gaps6p/logs
Global rank 0 (PID 28598) exited with code 1
Global rank 1 (PID 28599) exited with code 1
----------Begin global rank 1 STDOUT----------
Initializing model...
Building train loader...
Building eval loader...
Building trainer...
Logging config...
data_local: my-copy-c4
data_remote: null
max_seq_len: 1024
global_seed: 10
run_name: llm
name: mpt_causal_lm
init_device: meta
d_model: 768
n_heads: 12
n_layers: 12
expansion_ratio: 4
max_seq_len: ${max_seq_len}
vocab_size: 50368
attn_impl: triton
name: EleutherAI/gpt-neox-20b
model_max_length: ${max_seq_len}
name: text
local: ${data_local}
remote: ${data_remote}
split: train_small
shuffle: true
max_seq_len: ${max_seq_len}
shuffle_seed: ${global_seed}
drop_last: true
num_workers: 6
name: text
local: ${data_local}
remote: ${data_remote}
split: val_small
shuffle: false
max_seq_len: ${max_seq_len}
shuffle_seed: ${global_seed}
drop_last: false
num_workers: 6
name: cosine_with_warmup
t_warmup: 100ba
alpha_f: 0.1
name: decoupled_adamw
lr: 0.0006
Starting training...
----------End global rank 1 STDOUT----------
----------Begin global rank 1 STDERR----------
/workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/composer/callbacks/ UserWarning: gpu_flop count not found for None with precision: amp_bf16; MFU
cannot be calculated and reported. gpu_flops_available can be manuallyoverridden by setting
gpu_flops_available in SpeedMonitor.
Train time/epoch: 0
Train time/batch: 0
Train time/sample: 0
Train time/batch_in_epoch: 0
Train time/sample_in_epoch: 0
Train time/token: 0
Train time/token_in_epoch: 0
Train memory/allocated_mem: 1.4586
Train memory/active_mem: 1.4586
Train memory/inactive_mem: 0.8357
Train memory/reserved_mem: 8.5794
Train memory/alloc_retries: 0
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 11.6153
Train metrics/train/LanguageCrossEntropy: 11.6153
Train metrics/train/LanguagePerplexity: 110783.0234
Train time/train: 0.0073
Train time/val: 0.0000
Train time/total: 0.0073
Train lr-DecoupledAdamW/group0: 0.0000
Train time/batch: 1
Train time/sample: 256
Train time/batch_in_epoch: 1
Train time/sample_in_epoch: 256
Train time/token: 262144
Train time/token_in_epoch: 262144
Train memory/allocated_mem: 1.9580
Train memory/active_mem: 1.9580
Train memory/inactive_mem: 1.0283
Train memory/reserved_mem: 8.5794
Train memory/alloc_retries: 0
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 11.6203
Train metrics/train/LanguageCrossEntropy: 11.6203
Train metrics/train/LanguagePerplexity: 111336.3203
Train time/train: 0.0104
Train time/val: 0.0000
Train time/total: 0.0104
Train lr-DecoupledAdamW/group0: 0.0000
Train time/remaining_estimate: 0.0248
Train time/batch: 2
Train time/sample: 512
Train time/batch_in_epoch: 2
Train time/sample_in_epoch: 512
Train time/token: 524288
Train time/token_in_epoch: 524288
Train memory/allocated_mem: 1.9580
Train memory/active_mem: 1.9580
Train memory/inactive_mem: 1.0283
Train memory/reserved_mem: 8.5794
Train memory/alloc_retries: 0
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 11.6143
Train metrics/train/LanguageCrossEntropy: 11.6143
Train metrics/train/LanguagePerplexity: 110665.7031
Train time/train: 0.0135
Train time/val: 0.0000
Train time/total: 0.0135
Train lr-DecoupledAdamW/group0: 0.0000
Train time/remaining_estimate: 0.0218
Train time/batch: 3
Train time/sample: 768
Train time/batch_in_epoch: 3
Train time/sample_in_epoch: 768
Train time/token: 786432
Train time/token_in_epoch: 786432
Train memory/allocated_mem: 1.9580
Train memory/active_mem: 1.9580
Train memory/inactive_mem: 1.0283
Train memory/reserved_mem: 8.5794
Train memory/alloc_retries: 0
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 11.0220
Train metrics/train/LanguageCrossEntropy: 11.0220
Train metrics/train/LanguagePerplexity: 61206.2227
Train time/train: 0.0167
Train time/val: 0.0000
Train time/total: 0.0167
Train lr-DecoupledAdamW/group0: 0.0000
Train time/remaining_estimate: 0.0187
Train time/batch: 4
Train time/sample: 1024
Train time/batch_in_epoch: 4
Train time/sample_in_epoch: 1024
Train time/token: 1048576
Train time/token_in_epoch: 1048576
Train memory/allocated_mem: 1.9580
Train memory/active_mem: 1.9580
Train memory/inactive_mem: 1.0283
Train memory/reserved_mem: 8.5794
Train memory/alloc_retries: 0
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 10.5514
Train metrics/train/LanguageCrossEntropy: 10.5514
Train metrics/train/LanguagePerplexity: 38232.6797
Train time/train: 0.0198
Train time/val: 0.0000
Train time/total: 0.0198
Train lr-DecoupledAdamW/group0: 0.0000
Train time/remaining_estimate: 0.0156
Train time/batch: 5
Train time/sample: 1280
Train time/batch_in_epoch: 5
Train time/sample_in_epoch: 1280
Train time/token: 1310720
Train time/token_in_epoch: 1310720
Train memory/allocated_mem: 1.9580
Train memory/active_mem: 1.9580
Train memory/inactive_mem: 1.0283
Train memory/reserved_mem: 8.5794
Train memory/alloc_retries: 0
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 10.3442
Train metrics/train/LanguageCrossEntropy: 10.3442
Train metrics/train/LanguagePerplexity: 31076.6367
Train time/train: 0.0230
Train time/val: 0.0000
Train time/total: 0.0230
Train lr-DecoupledAdamW/group0: 0.0000
Train time/remaining_estimate: 0.0125
Train time/batch: 6
Train time/sample: 1536
Train time/batch_in_epoch: 6
Train time/sample_in_epoch: 1536
Train time/token: 1572864
Train time/token_in_epoch: 1572864
Train memory/allocated_mem: 1.9580
Train memory/active_mem: 1.9580
Train memory/inactive_mem: 1.0283
Train memory/reserved_mem: 8.5794
Train memory/alloc_retries: 0
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 10.1905
Train metrics/train/LanguageCrossEntropy: 10.1905
Train metrics/train/LanguagePerplexity: 26649.7637
Train time/train: 0.0261
Train time/val: 0.0000
Train time/total: 0.0261
Train lr-DecoupledAdamW/group0: 0.0000
Train time/remaining_estimate: 0.0094
Train time/batch: 7
Train time/sample: 1792
Train time/batch_in_epoch: 7
Train time/sample_in_epoch: 1792
Train time/token: 1835008
Train time/token_in_epoch: 1835008
Train memory/allocated_mem: 1.9580
Train memory/active_mem: 1.9580
Train memory/inactive_mem: 1.0283
Train memory/reserved_mem: 8.5794
Train memory/alloc_retries: 0
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 10.0649
Train metrics/train/LanguageCrossEntropy: 10.0649
Train metrics/train/LanguagePerplexity: 23502.5723
Train time/train: 0.0293
Train time/val: 0.0000
Train time/total: 0.0293
Train lr-DecoupledAdamW/group0: 0.0000
Train time/remaining_estimate: 0.0063
Train time/batch: 8
Train time/sample: 2048
Train time/batch_in_epoch: 8
Train time/sample_in_epoch: 2048
Train time/token: 2097152
Train time/token_in_epoch: 2097152
Train memory/allocated_mem: 1.9580
Train memory/active_mem: 1.9580
Train memory/inactive_mem: 1.0283
Train memory/reserved_mem: 8.5794
Train memory/alloc_retries: 0
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 9.8342
Train metrics/train/LanguageCrossEntropy: 9.8342
Train metrics/train/LanguagePerplexity: 18660.9160
Train time/train: 0.0324
Train time/val: 0.0000
Train time/total: 0.0324
Train lr-DecoupledAdamW/group0: 0.0000
Train time/remaining_estimate: 0.0031
Train time/batch: 9
Train time/sample: 2304
Train time/batch_in_epoch: 9
Train time/sample_in_epoch: 2304
Train time/token: 2359296
Train time/token_in_epoch: 2359296
Train memory/allocated_mem: 1.9580
Train memory/active_mem: 1.9580
Train memory/inactive_mem: 1.0283
Train memory/reserved_mem: 8.5794
Train memory/alloc_retries: 0
Train trainer/device_train_microbatch_size: 8
Train loss/train/total: 9.6967
Train metrics/train/LanguageCrossEntropy: 9.6967
Train metrics/train/LanguagePerplexity: 16264.4531
Train time/train: 0.0355
Train time/val: 0.0000
Train time/total: 0.0355
Train lr-DecoupledAdamW/group0: 0.0001
Train time/remaining_estimate: 0.0000
Traceback (most recent call last):
File "/workspace/llm-foundry/scripts/train/", line 254, in
----------End global rank 1 STDERR---------- ERROR:composer.cli.launcher:Global rank 0 (PID 28598) exited with code 1 (llmfoundryenv) root@fe708568bac8:/workspace/llm-foundry/scripts#
for name, non_tensor_value in object_state.non_tensors.items():
AttributeError: 'int' object has no attribute 'items'
issue is a known issue when using torch2 and the issue is fixed in composer's dev branch / will be updtd in next release of composer.
I am trying to run the tests suite to see if my setup is correct and I am down to 31 failed, 4852 passed etc...
However the ones that failed are strange
Here a partial log, full log below:
-- Docs: ========================================================================================= short test summary info ==========================================================================================FAILED tests/ - FileNotFoundError: Couldn't find a dataset script at /workspace/llm-foundry/jsonl/ or any data file in the same directory. Couldn't find 'jsonl' on the Hugging Face Hub either: FileNotFoundEr... FAILED tests/ - NotADirectoryError: [Errno 20] Not a directory: '/workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/mpt/' FAILED tests/[False] - FileNotFoundError: [Errno 2] No such file or directory: '.scripts/train/yamls/pretrain/gpt2-small.yaml' FAILED tests/[True] - FileNotFoundError: [Errno 2] No such file or directory: '.scripts/train/yamls/pretrain/gpt2-small.yaml' FAILED tests/ - NotADirectoryError: [Errno 20] Not a directory: '/workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/mpt/' FAILED triton/python/test/regression/[256-256-256-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[512-512-512-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[1024-1024-1024-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[2048-2048-2048-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[4096-4096-4096-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[8192-8192-8192-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[16-1024-1024-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[16-4096-4096-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[16-8192-8192-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[64-1024-1024-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[64-4096-4096-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[64-8192-8192-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[1024-64-1024-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[4096-64-4096-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[8192-64-8192-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[16384] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[65536] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[262144] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[1048576] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[4194304] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[16777216] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[67108864] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/unit/operators/[16-256-False] - AssertionError: FAILED triton/python/test/unit/operators/[32-576-False] - AssertionError: FAILED triton/python/test/unit/operators/[64-1871-False] - AssertionError: FAILED triton/python/test/unit/operators/[128-2511-False] - AssertionError: ==================================================== 31 failed, 4852 passed, 49 skipped, 174 deselected, 16 xfailed, 60 warnings in 1997.10s (0:33:17) =====================================================
Now my config is on HP z840 with a xeon 2620v3 and 2x Nvidia RTX A4000, however as you can see, during the whole tests the cards barely react, with a small load and never running faster than 210Mhz, and I have the error.
I tried to change in
def _get_block_size(device, head_dim, is_dropout): assert head_dim % 8 == 0 and head_dim <= 64 return 128 if head_dim <= 32 else 64
Also quick question is the code geared towards multi GPU, I believe it would be because for training itself it would take much more time than simply finetuning mpt-7b.
How does it look to the trained eye, is this error a false positive? Is there another way I can test the gpus?
Full log below:
================================================================================================= FAILURES =================================================================================================____ test_json_script_from_api _____tests/ in test_json_script_from_api main_json( scripts/data_prep/ in main dataset = build_hf_dataset(path=args.path, scripts/data_prep/ in build_hf_dataset hf_dataset = hf_datasets.load_dataset('jsonl', llmfoundryenv/lib/python3.10/site-packages/datasets/ in load_dataset builder_instance = load_dataset_builder( llmfoundryenv/lib/python3.10/site-packages/datasets/ in load_dataset_builder dataset_module = dataset_module_factory( llmfoundryenv/lib/python3.10/site-packages/datasets/ in dataset_modulefactory raise FileNotFoundError( E FileNotFoundError: Couldn't find a dataset script at /workspace/llm-foundry/jsonl/ or any data file in the same directory. Couldn't find 'jsonl' on the Hugging Face Hub either: FileNotFoundError: Dataset 'jsonl' doesn't exist on the Hub. If the repo is private or gated, make sure to log in with
huggingface-cli login
. test_convert_and_generate_torch __tests/ in test_convert_and_generate_torch main(args) scripts/inference/ in main loaded_hf_model.save_pretrained(local_folder_path) llmfoundryenv/lib/python3.10/site-packages/transformers/ in save_pretrained custom_object_save(self, save_directory, config=self.config) llmfoundryenv/lib/python3.10/site-packages/transformers/ in custom_object_save shutil.copy(object_file, dest_file) /usr/lib/python3.10/ in copy copyfile(src, dst, follow_symlinks=follow_symlinks) /usr/lib/python3.10/ in copyfile with open(src, 'rb') as fsrc: E NotADirectoryError: [Errno 20] Not a directory: '/workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/mpt/' ------------------------------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------------------------------You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization. Downloading checkpoint from /tmp/pytest-of-root/pytest-1/test_convert_and_generate_torc0/ -> /tmp/tmpu8uyjuwo/ Loading checkpoint into CPU RAM... ############################## Saving HF Model Config... MPTConfig { "attn_config": { "alibi": false, "alibi_bias_max": 8, "attn_impl": "torch", "attn_pdrop": 0.0, "attn_type": "multihead_attention", "attn_uses_sequence_id": false, "clip_qkv": null, "prefix_lm": false, "qk_ln": false, "softmax_scale": null }, "d_model": 128, "emb_pdrop": 0.0, "embedding_fraction": 1.0, "expansion_ratio": 4, "init_config": { "emb_init_std": null, "emb_init_uniform_lim": null, "fan_mode": "fan_in", "init_div_is_residual": true, "init_gain": 0.0, "init_nonlinearity": "relu", "init_std": null, "name": "kaimingnormal", "verbose": 0 }, "init_device": "cpu", "learned_pos_emb": true, "logit_scale": null, "max_seq_len": 128, "model_type": "mpt", "n_heads": 2, "n_layers": 2, "no_bias": false, "norm_type": "low_precision_layernorm", "resid_pdrop": 0.0, "torch_dtype": "float32", "transformers_version": "4.28.1", "use_cache": false, "verbose": 0, "vocab_size": 50368 }############################## Saving HF Tokenizer... GPTNeoXTokenizerFast(name_or_path='', vocab_size=50254, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True) ############################## Saving HF Model Weights... ############################## HF checkpoint folder successfully created at /tmp/pytest-of-root/pytest-1/test_convert_and_generate_torc0/hf-output-folder. Done. ############################## Loading model from /tmp/pytest-of-root/pytest-1/test_convert_and_generate_torc0/hf-output-folder You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization. ------------------------------------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------------------------------INFO The number of tokens in the tokenizer is less than the number of tokens in the model. You may want to resize the model embeddings to 50277 from 50368 by calling
before calling theHuggingFaceModel
constructor. The vocab size is sometimes intentionally set to a multiple of 32 or 64 to improve performance. INFO Setting seed to 3894649697 INFO Run name: 1685369687-natural-bonobo INFO Created a temporary directory at /tmp/tmpn8cizn5z INFO Writing /tmp/tmpn8cizn5z/ INFO Stepping schedulers every batch. To step schedulers every epoch, setstep_schedulers_every_batch=False
. INFO Setting seed to 3894649697 INFO Setting seed to 3894649697 _ test_full_forward_and_backward_gpt2small[False] tests/ in test_full_forward_and_backward_gpt2_small with open(confpath) as f: E FileNotFoundError: [Errno 2] No such file or directory: '.scripts/train/yamls/pretrain/gpt2-small.yaml' ____ test_full_forward_and_backward_gpt2_small[True] __tests/ in test_full_forward_and_backward_gpt2_small with open(conf_path) as f: E FileNotFoundError: [Errno 2] No such file or directory: '.scripts/train/yamls/pretrain/gpt2-small.yaml' ____ test_save_from_pretrained _tests/ in test_save_from_pretrained mpt.save_pretrained(tmp_path / 'test-save-pretrained') llmfoundryenv/lib/python3.10/site-packages/transformers/ in save_pretrained custom_object_save(self, save_directory, config=self.config) llmfoundryenv/lib/python3.10/site-packages/transformers/ in custom_object_save shutil.copy(object_file, dest_file) /usr/lib/python3.10/ in copy copyfile(src, dst, follow_symlinks=follow_symlinks) /usr/lib/python3.10/ in copyfile with open(src, 'rb') as fsrc: E NotADirectoryError: [Errno 20] Not a directory: '/workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/mpt/' ------------------------------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------------------------------You are using config.init_device='cpu', but you can also use config.initdevice="meta" with Composer + FSDP for fast initialization. ____ test_matmul[256-256-256-float16] _____triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_smclock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) test_matmul[512-512-512-float16] ___triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_smclock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) test_matmul[1024-1024-1024-float16] ____triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) _ test_matmul[2048-2048-2048-float16] __triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) _ test_matmul[4096-4096-4096-float16] __triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) _ test_matmul[8192-8192-8192-float16] __triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) ____ test_matmul[16-1024-1024-float16] _triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) ____ test_matmul[16-4096-4096-float16] _____triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) ____ test_matmul[16-8192-8192-float16] _____triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) ____ test_matmul[64-1024-1024-float16] _____triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) ____ test_matmul[64-4096-4096-float16] _____triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) ____ test_matmul[64-8192-8192-float16] _____triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) ____ test_matmul[1024-64-1024-float16] _____triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) ____ test_matmul[4096-64-4096-float16] _____triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_sm_clock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) ____ test_matmul[8192-64-8192-float16] _____triton/python/test/regression/ in test_matmul assert abs(cur_sm_clock - ref_sm_clock) < 10, f'GPU SMs must run at {ref_smclock} MHz' E AssertionError: GPU SMs must run at 1350 MHz E assert 1140 < 10 E + where 1140 = abs((210 - 1350)) test_elementwise[16384] __triton/python/test/regression/ in test_elementwise assert abs(cur_mem_clock - ref_mem_clock) < 10, f'GPU memory must run at {ref_memclock} MHz' E AssertionError: GPU memory must run at 877 MHz E assert 472 < 10 E + where 472 = abs((405 - 877)) ____ test_elementwise[65536] __triton/python/test/regression/ in test_elementwise assert abs(cur_mem_clock - ref_mem_clock) < 10, f'GPU memory must run at {ref_memclock} MHz' E AssertionError: GPU memory must run at 877 MHz E assert 472 < 10 E + where 472 = abs((405 - 877)) ____ test_elementwise[262144] _triton/python/test/regression/ in test_elementwise assert abs(cur_mem_clock - ref_mem_clock) < 10, f'GPU memory must run at {ref_mem_clock} MHz' E AssertionError: GPU memory must run at 877 MHz E assert 472 < 10 E + where 472 = abs((405 - 877)) ____ test_elementwise[1048576] _____triton/python/test/regression/ in test_elementwise assert abs(cur_mem_clock - ref_mem_clock) < 10, f'GPU memory must run at {ref_mem_clock} MHz' E AssertionError: GPU memory must run at 877 MHz E assert 472 < 10 E + where 472 = abs((405 - 877)) ____ test_elementwise[4194304] _____triton/python/test/regression/ in test_elementwise assert abs(cur_mem_clock - ref_mem_clock) < 10, f'GPU memory must run at {ref_mem_clock} MHz' E AssertionError: GPU memory must run at 877 MHz E assert 472 < 10 E + where 472 = abs((405 - 877)) ____ test_elementwise[16777216] triton/python/test/regression/ in test_elementwise assert abs(cur_mem_clock - ref_mem_clock) < 10, f'GPU memory must run at {ref_mem_clock} MHz' E AssertionError: GPU memory must run at 877 MHz E assert 472 < 10 E + where 472 = abs((405 - 877)) ____ test_elementwise[67108864] ____triton/python/test/regression/ in test_elementwise assert abs(cur_mem_clock - ref_mem_clock) < 10, f'GPU memory must run at {ref_mem_clock} MHz' E AssertionError: GPU memory must run at 877 MHz E assert 472 < 10 E + where 472 = abs((405 - 877)) ____ test_softmax[16-256-False] ____triton/python/test/unit/operators/ in test_softmax triton.testing.assert_almost_equal(da_tri, da_ref) llmfoundryenv/lib/python3.10/site-packages/triton_pre_mlir/ in assert_almost_equal npt.assert_array_almost_equal(x, y, err_msg=err_msg, decimal=decimal) /usr/lib/python3.10/ in inner return func(*args, kwds) /usr/lib/python3.10/ in inner return func(*args, *kwds) E AssertionError: E Arrays are not almost equal to 2 decimals EE x and y nan location mismatch: E x: array([[[[ 0.00e+00, 0.00e+00, 0.00e+00, ..., 0.00e+00, 0.00e+00, E 0.00e+00], E [ 0.00e+00, 0.00e+00, 0.00e+00, ..., 0.00e+00, 0.00e+00,... E y: array([[[[ nan, nan, nan, ..., nan, nan, E nan], E [ nan, nan, nan, ..., nan, nan,... ____ test_softmax[32-576-False] ____triton/python/test/unit/operators/ in test_softmax triton.testing.assert_almost_equal(da_tri, da_ref) llmfoundryenv/lib/python3.10/site-packages/triton_pre_mlir/ in assert_almost_equal npt.assert_array_almost_equal(x, y, err_msg=err_msg, decimal=decimal) /usr/lib/python3.10/ in inner return func(args, kwds) /usr/lib/python3.10/ in inner return func(*args, kwds) E AssertionError: E Arrays are not almost equal to 2 decimals E
E x and y nan location mismatch: E x: array([[[[ 0.00e+00, 0.00e+00, 0.00e+00, ..., 0.00e+00, 0.00e+00, E 0.00e+00], E [ 0.00e+00, 0.00e+00, 0.00e+00, ..., 0.00e+00, 0.00e+00,... E y: array([[[[ nan, nan, nan, ..., nan, nan, E nan], E [ nan, nan, nan, ..., nan, nan,... _ test_softmax[64-1871-False] __triton/python/test/unit/operators/ in test_softmax triton.testing.assert_almost_equal(da_tri, da_ref) llmfoundryenv/lib/python3.10/site-packages/triton_pre_mlir/ in assert_almost_equal npt.assert_array_almost_equal(x, y, err_msg=err_msg, decimal=decimal) /usr/lib/python3.10/ in inner return func(*args, *kwds) /usr/lib/python3.10/ in inner return func(args, kwds) E AssertionError: E Arrays are not almost equal to 2 decimals E
E x and y nan location mismatch: E x: array([[[[ 0.00e+00, 0.00e+00, 0.00e+00, ..., 0.00e+00, 0.00e+00, E 0.00e+00], E [ 0.00e+00, 0.00e+00, 0.00e+00, ..., 0.00e+00, 0.00e+00,... E y: array([[[[ nan, nan, nan, ..., nan, nan, E nan], E [ nan, nan, nan, ..., nan, nan,... ___ test_softmax[128-2511-False] ___triton/python/test/unit/operators/ in test_softmax triton.testing.assert_almost_equal(da_tri, da_ref) llmfoundryenv/lib/python3.10/site-packages/triton_pre_mlir/ in assert_almost_equal npt.assert_array_almost_equal(x, y, err_msg=err_msg, decimal=decimal) /usr/lib/python3.10/ in inner return func(*args, *kwds) /usr/lib/python3.10/ in inner return func(args, **kwds) E AssertionError: E Arrays are not almost equal to 2 decimals E
E x and y nan location mismatch: E x: array([[[[ 0.00e+00, 0.00e+00, 0.00e+00, ..., 0.00e+00, 0.00e+00, E 0.00e+00], E [ 0.00e+00, 0.00e+00, 0.00e+00, ..., 0.00e+00, 0.00e+00,... E y: array([[[[ nan, nan, nan, ..., nan, nan, E nan], E [ nan, nan, nan, ..., nan, nan,... ============================================================================================= warnings summary =============================================================================================tests/[True-facebook/opt-125m] /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/data/ UserWarning: The provided tokenizer adds special tokens, but you also specified bos_text. This may result in duplicated special tokens. Please be sure this is what you intend.
tests/ /root/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b/4ff95c4aec5c04ba509ddf517c56720541a7a487/ UserWarning: Using
attn_impl: torch
. If your model does not usealibi
we recommend usingattn_impl: flash
otherwise we recommend usingattn_impl: triton
. warnings.warn('Usingattn_impl: torch
. If your model does not usealibi
or ' + 'prefix_lm
we recommend usingattn_impl: flash
otherwise ' + 'we recommend usingattn_impl: triton
.')tests/[emb_init_cfg2] /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/utils/ UserWarning: Embedding layer initialized to 0. warnings.warn(f'Embedding layer initialized to 0.')
tests/[emb_init_cfg5] /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/utils/ UserWarning: Embedding layer initialized to 0. warnings.warn(f'Embedding layer initialized to 0.')
tests/[emb_init_cfg6] /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/utils/ UserWarning: Embedding layer initialized to 1. warnings.warn(f'Embedding layer initialized to {lim[0]}.')
tests/[True-generation_kwargs2] tests/[False-generation_kwargs2] /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/transformers/generation/ UserWarning: Using
's default (20) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend usingmax_new_tokens
to control the maximum length of the generation. warnings.warn(tests/ /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/mpt/ TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
tests/ /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/mpt/ TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
tests/ /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/mpt/ TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
tests/ /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/layers/ TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
tests/ /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/layers/ TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
tests/ /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/layers/ TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
tests/ /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/layers/ TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
tests/ /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/layers/ TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
tests/ /workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/layers/ TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
triton/python/test/unit/language/ 38 warnings /workspace/llm-foundry/triton/python/test/unit/language/ RuntimeWarning: overflow encountered in cast z_ref = z_ref.astype(dtype_z)
triton/python/test/unit/language/[add-uint32-min_neg] triton/python/test/unit/language/[max-uint32-min_neg] triton/python/test/unit/language/[min-uint32-min_neg] /workspace/llm-foundry/triton/python/test/unit/language/ RuntimeWarning: overflow encountered in scalar negative x[idx] = -np.max(np.abs(x)) - 1
triton/python/test/unit/language/[10-16045690984503095482] triton/python/test/unit/language/[4,53-16045690984503095482] triton/python/test/unit/language/[10000-16045690984503095482] /workspace/llm-foundry/triton/python/test/unit/language/ DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 16045690984503095482 to uint32 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype)` will give the desired result (the cast overflows). res.append(np.array(n, dtype=self._dtype))
-- Docs: ========================================================================================= short test summary info ==========================================================================================FAILED tests/ - FileNotFoundError: Couldn't find a dataset script at /workspace/llm-foundry/jsonl/ or any data file in the same directory. Couldn't find 'jsonl' on the Hugging Face Hub either: FileNotFoundEr... FAILED tests/ - NotADirectoryError: [Errno 20] Not a directory: '/workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/mpt/' FAILED tests/[False] - FileNotFoundError: [Errno 2] No such file or directory: '.scripts/train/yamls/pretrain/gpt2-small.yaml' FAILED tests/[True] - FileNotFoundError: [Errno 2] No such file or directory: '.scripts/train/yamls/pretrain/gpt2-small.yaml' FAILED tests/ - NotADirectoryError: [Errno 20] Not a directory: '/workspace/llm-foundry/llmfoundryenv/lib/python3.10/site-packages/llm_foundry-0.1.0-py3.10.egg/llmfoundry/models/mpt/' FAILED triton/python/test/regression/[256-256-256-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[512-512-512-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[1024-1024-1024-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[2048-2048-2048-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[4096-4096-4096-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[8192-8192-8192-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[16-1024-1024-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[16-4096-4096-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[16-8192-8192-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[64-1024-1024-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[64-4096-4096-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[64-8192-8192-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[1024-64-1024-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[4096-64-4096-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[8192-64-8192-float16] - AssertionError: GPU SMs must run at 1350 MHz FAILED triton/python/test/regression/[16384] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[65536] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[262144] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[1048576] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[4194304] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[16777216] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/regression/[67108864] - AssertionError: GPU memory must run at 877 MHz FAILED triton/python/test/unit/operators/[16-256-False] - AssertionError: FAILED triton/python/test/unit/operators/[32-576-False] - AssertionError: FAILED triton/python/test/unit/operators/[64-1871-False] - AssertionError: FAILED triton/python/test/unit/operators/[128-2511-False] - AssertionError: ==================================================== 31 failed, 4852 passed, 49 skipped, 174 deselected, 16 xfailed, 60 warnings in 1997.10s (0:33:17) =====================================================