microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
MIT License
190 stars 21 forks source link

BitNET training , nan #44

Closed robotzheng closed 3 weeks ago

robotzheng commented 1 month ago

zzt-learning_rate2:0.006 Overriding config with config/scale_gpt.py:

config for scaling GPT following Kaplan et al.

wandb_log = True wandb_project = 'owt-scaling' wandb_run_id = "" # give only when resuming a W&B run always_save_checkpoint = False

setting default values of scale_N, scale_D to False, you must change them from command line when scaling.

scaling = "Kaplan" scale_N = False scale_D = False

replace n_layer, n_embd and fraction_of_data from command line. Default values:

n_layer = 12 n_embd = 768 fraction_of_data = 1.0

Can also set n_head from command line, but set default value through this rule of thumb

given in Appendix F of 2010.14701

It is consistent with nanoGPT where n_embd = 768, and n_head = 768//64 = 12.

n_head = 12 #max(2, n_embd // 64)

TRAINING CONFIGURATIONS FROM KAPLAN ET AL

total batch size = 512 so set local batch size = 16, gradaccum = 32

batch_size = 16 block_size = 1024 gradient_accumulation_steps = 4 * 8

total number of training iterations = 2.5e5

learning rate warms up for 3000 iterations and decays to 0 at the end of training.

dropout = 0.1 (see Section 4.2). minimum learning rate is 0

maximum learning rate is given by equation D.1 of the paper. It depends on N, so we set it in configurator.py

max_iters = int(2.5e5) warmup_iters = 3000 lr_decay_iters = int(2.5e5) dropout = 0.0 min_lr = 0 learning_rate = 6e-3

eval stuff same as nanoGPT

eval_interval = 1000 eval_iters = 200 log_interval = 10

weight decay same as nanoGPT

weight_decay = 1e-1

Overriding: scale_N = True Overriding: n_layer = 12 Overriding: n_embd = 768 zzt-before-scale_N:0.006 zzt-learning_rate3:0.006 zzt-learning_rate2:0.006 Overriding config with config/scale_gpt.py:

config for scaling GPT following Kaplan et al.

wandb_log = True wandb_project = 'owt-scaling' wandb_run_id = "" # give only when resuming a W&B run always_save_checkpoint = False

setting default values of scale_N, scale_D to False, you must change them from command line when scaling.

scaling = "Kaplan" scale_N = False scale_D = False

replace n_layer, n_embd and fraction_of_data from command line. Default values:

n_layer = 12 n_embd = 768 fraction_of_data = 1.0

Can also set n_head from command line, but set default value through this rule of thumb

given in Appendix F of 2010.14701

It is consistent with nanoGPT where n_embd = 768, and n_head = 768//64 = 12.

n_head = 12 #max(2, n_embd // 64)

TRAINING CONFIGURATIONS FROM KAPLAN ET AL

total batch size = 512 so set local batch size = 16, gradaccum = 32

batch_size = 16 block_size = 1024 gradient_accumulation_steps = 4 * 8

total number of training iterations = 2.5e5

learning rate warms up for 3000 iterations and decays to 0 at the end of training.

dropout = 0.1 (see Section 4.2). minimum learning rate is 0

maximum learning rate is given by equation D.1 of the paper. It depends on N, so we set it in configurator.py

max_iters = int(2.5e5) warmup_iters = 3000 lr_decay_iters = int(2.5e5) dropout = 0.0 min_lr = 0 learning_rate = 6e-3

eval stuff same as nanoGPT

eval_interval = 1000 eval_iters = 200 log_interval = 10

weight decay same as nanoGPT

weight_decay = 1e-1

Overriding: scale_N = True Overriding: n_layer = 12 Overriding: n_embd = 768 zzt-before-scale_N:0.006 zzt-learning_rate3:0.006 zzt-learning_rate2:0.006 Overriding config with config/scale_gpt.py:

config for scaling GPT following Kaplan et al.

wandb_log = True wandb_project = 'owt-scaling' wandb_run_id = "" # give only when resuming a W&B run always_save_checkpoint = False

setting default values of scale_N, scale_D to False, you must change them from command line when scaling.

scaling = "Kaplan" scale_N = False scale_D = False

replace n_layer, n_embd and fraction_of_data from command line. Default values:

n_layer = 12 n_embd = 768 fraction_of_data = 1.0

Can also set n_head from command line, but set default value through this rule of thumb

given in Appendix F of 2010.14701

It is consistent with nanoGPT where n_embd = 768, and n_head = 768//64 = 12.

n_head = 12 #max(2, n_embd // 64)

TRAINING CONFIGURATIONS FROM KAPLAN ET AL

total batch size = 512 so set local batch size = 16, gradaccum = 32

batch_size = 16 block_size = 1024 gradient_accumulation_steps = 4 * 8

total number of training iterations = 2.5e5

learning rate warms up for 3000 iterations and decays to 0 at the end of training.

dropout = 0.1 (see Section 4.2). minimum learning rate is 0

maximum learning rate is given by equation D.1 of the paper. It depends on N, so we set it in configurator.py

max_iters = int(2.5e5) warmup_iters = 3000 lr_decay_iters = int(2.5e5) dropout = 0.0 min_lr = 0 learning_rate = 6e-3

eval stuff same as nanoGPT

eval_interval = 1000 eval_iters = 200 log_interval = 10

weight decay same as nanoGPT

weight_decay = 1e-1

Overriding: scale_N = True Overriding: n_layer = 12 Overriding: n_embd = 768 zzt-before-scale_N:0.006 zzt-learning_rate3:0.006 zzt-learning_rate2:0.006 zzt-learning_rate2:0.006 Overriding config with config/scale_gpt.py:

config for scaling GPT following Kaplan et al.

wandb_log = True wandb_project = 'owt-scaling' wandb_run_id = "" # give only when resuming a W&B run always_save_checkpoint = False

setting default values of scale_N, scale_D to False, you must change them from command line when scaling.

scaling = "Kaplan" scale_N = False scale_D = False

replace n_layer, n_embd and fraction_of_data from command line. Default values:

n_layer = 12 n_embd = 768 fraction_of_data = 1.0

Can also set n_head from command line, but set default value through this rule of thumb

given in Appendix F of 2010.14701

It is consistent with nanoGPT where n_embd = 768, and n_head = 768//64 = 12.

n_head = 12 #max(2, n_embd // 64)

TRAINING CONFIGURATIONS FROM KAPLAN ET AL

total batch size = 512 so set local batch size = 16, gradaccum = 32

batch_size = 16 block_size = 1024 gradient_accumulation_steps = 4 * 8

total number of training iterations = 2.5e5

learning rate warms up for 3000 iterations and decays to 0 at the end of training.

dropout = 0.1 (see Section 4.2). minimum learning rate is 0

maximum learning rate is given by equation D.1 of the paper. It depends on N, so we set it in configurator.py

max_iters = int(2.5e5) warmup_iters = 3000 lr_decay_iters = int(2.5e5) dropout = 0.0 min_lr = 0 learning_rate = 6e-3

eval stuff same as nanoGPT

eval_interval = 1000 eval_iters = 200 log_interval = 10

weight decay same as nanoGPT

weight_decay = 1e-1

Overriding config with config/scale_gpt.py:

config for scaling GPT following Kaplan et al.

wandb_log = True wandb_project = 'owt-scaling' wandb_run_id = "" # give only when resuming a W&B run always_save_checkpoint = False

setting default values of scale_N, scale_D to False, you must change them from command line when scaling.

scaling = "Kaplan" scale_N = False scale_D = False

replace n_layer, n_embd and fraction_of_data from command line. Default values:

n_layer = 12 n_embd = 768 fraction_of_data = 1.0

Can also set n_head from command line, but set default value through this rule of thumb

given in Appendix F of 2010.14701

It is consistent with nanoGPT where n_embd = 768, and n_head = 768//64 = 12.

n_head = 12 #max(2, n_embd // 64)

TRAINING CONFIGURATIONS FROM KAPLAN ET AL

total batch size = 512 so set local batch size = 16, gradaccum = 32

batch_size = 16 block_size = 1024 gradient_accumulation_steps = 4 * 8

total number of training iterations = 2.5e5

learning rate warms up for 3000 iterations and decays to 0 at the end of training.

dropout = 0.1 (see Section 4.2). minimum learning rate is 0

maximum learning rate is given by equation D.1 of the paper. It depends on N, so we set it in configurator.py

max_iters = int(2.5e5) warmup_iters = 3000 lr_decay_iters = int(2.5e5) dropout = 0.0 min_lr = 0 learning_rate = 6e-3

eval stuff same as nanoGPT

eval_interval = 1000 eval_iters = 200 log_interval = 10

weight decay same as nanoGPT

weight_decay = 1e-1

Overriding: scale_N = True Overriding: n_layer = 12 Overriding: n_embd = 768 zzt-before-scale_N:0.006 zzt-learning_rate3:0.006 Overriding: scale_N = True Overriding: n_layer = 12 Overriding: n_embd = 768 zzt-before-scale_N:0.006 zzt-learning_rate3:0.006 zzt-learning_rate2:0.006 Overriding config with config/scale_gpt.py:

config for scaling GPT following Kaplan et al.

wandb_log = True wandb_project = 'owt-scaling' wandb_run_id = "" # give only when resuming a W&B run always_save_checkpoint = False

setting default values of scale_N, scale_D to False, you must change them from command line when scaling.

scaling = "Kaplan" scale_N = False scale_D = False

replace n_layer, n_embd and fraction_of_data from command line. Default values:

n_layer = 12 n_embd = 768 fraction_of_data = 1.0

Can also set n_head from command line, but set default value through this rule of thumb

given in Appendix F of 2010.14701

It is consistent with nanoGPT where n_embd = 768, and n_head = 768//64 = 12.

n_head = 12 #max(2, n_embd // 64)

TRAINING CONFIGURATIONS FROM KAPLAN ET AL

total batch size = 512 so set local batch size = 16, gradaccum = 32

batch_size = 16 block_size = 1024 gradient_accumulation_steps = 4 * 8

total number of training iterations = 2.5e5

learning rate warms up for 3000 iterations and decays to 0 at the end of training.

dropout = 0.1 (see Section 4.2). minimum learning rate is 0

maximum learning rate is given by equation D.1 of the paper. It depends on N, so we set it in configurator.py

max_iters = int(2.5e5) warmup_iters = 3000 lr_decay_iters = int(2.5e5) dropout = 0.0 min_lr = 0 learning_rate = 6e-3

eval stuff same as nanoGPT

eval_interval = 1000 eval_iters = 200 log_interval = 10

weight decay same as nanoGPT

weight_decay = 1e-1

Overriding: scale_N = True Overriding: n_layer = 12 Overriding: n_embd = 768 zzt-before-scale_N:0.006 zzt-learning_rate3:0.006 zzt-learning_rate2:0.006 zzt-learning_rate2:0.006 Overriding config with config/scale_gpt.py: Overriding config with config/scale_gpt.py:

config for scaling GPT following Kaplan et al.

wandb_log = True wandb_project = 'owt-scaling' wandb_run_id = "" # give only when resuming a W&B run always_save_checkpoint = False

setting default values of scale_N, scale_D to False, you must change them from command line when scaling.

scaling = "Kaplan" scale_N = False scale_D = False

replace n_layer, n_embd and fraction_of_data from command line. Default values:

n_layer = 12 n_embd = 768 fraction_of_data = 1.0

Can also set n_head from command line, but set default value through this rule of thumb

given in Appendix F of 2010.14701

It is consistent with nanoGPT where n_embd = 768, and n_head = 768//64 = 12.

n_head = 12 #max(2, n_embd // 64)

TRAINING CONFIGURATIONS FROM KAPLAN ET AL

total batch size = 512 so set local batch size = 16, gradaccum = 32

batch_size = 16 block_size = 1024 gradient_accumulation_steps = 4 * 8

total number of training iterations = 2.5e5

learning rate warms up for 3000 iterations and decays to 0 at the end of training.

dropout = 0.1 (see Section 4.2). minimum learning rate is 0

maximum learning rate is given by equation D.1 of the paper. It depends on N, so we set it in configurator.py

max_iters = int(2.5e5) warmup_iters = 3000 lr_decay_iters = int(2.5e5) dropout = 0.0 min_lr = 0 learning_rate = 6e-3

eval stuff same as nanoGPT

eval_interval = 1000 eval_iters = 200 log_interval = 10

weight decay same as nanoGPT

weight_decay = 1e-1

config for scaling GPT following Kaplan et al.

wandb_log = True wandb_project = 'owt-scaling' wandb_run_id = "" # give only when resuming a W&B run always_save_checkpoint = False

setting default values of scale_N, scale_D to False, you must change them from command line when scaling.

scaling = "Kaplan" scale_N = False scale_D = False

replace n_layer, n_embd and fraction_of_data from command line. Default values:

n_layer = 12 n_embd = 768 fraction_of_data = 1.0

Can also set n_head from command line, but set default value through this rule of thumb

given in Appendix F of 2010.14701

It is consistent with nanoGPT where n_embd = 768, and n_head = 768//64 = 12.

n_head = 12 #max(2, n_embd // 64)

TRAINING CONFIGURATIONS FROM KAPLAN ET AL

total batch size = 512 so set local batch size = 16, gradaccum = 32

batch_size = 16 block_size = 1024 gradient_accumulation_steps = 4 * 8

total number of training iterations = 2.5e5

learning rate warms up for 3000 iterations and decays to 0 at the end of training.

dropout = 0.1 (see Section 4.2). minimum learning rate is 0

maximum learning rate is given by equation D.1 of the paper. It depends on N, so we set it in configurator.py

max_iters = int(2.5e5) warmup_iters = 3000 lr_decay_iters = int(2.5e5) dropout = 0.0 min_lr = 0 learning_rate = 6e-3

eval stuff same as nanoGPT

eval_interval = 1000 eval_iters = 200 log_interval = 10

weight decay same as nanoGPT

weight_decay = 1e-1

Overriding: scale_N = True Overriding: scale_N = True Overriding: n_layer = 12 Overriding: n_layer = 12 Overriding: n_embd = 768 Overriding: n_embd = 768 zzt-before-scale_N:0.006 zzt-before-scale_N:0.006 zzt-learning_rate3:0.006 zzt-learning_rate3:0.006 tokens per iteration will be: 524,288 Initializing a new model from scratch defaulting to vocab_size of GPT-2 to 50304 (50257 rounded up for efficiency) zzt:BitnetConfig { "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 768, "initializer_range": 0.02, "input_bits": 8, "intermediate_size": 2048, "max_position_embeddings": 1024, "model_type": "llama", "num_attention_heads": 12, "num_hidden_layers": 12, "num_key_value_heads": 12, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.38.1", "use_cache": true, "vocab_size": 50304, "weight_bits": 1 }

Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) tokens per iteration will be: 524,288 Initializing a new model from scratch defaulting to vocab_size of GPT-2 to 50304 (50257 rounded up for efficiency) zzt:BitnetConfig { "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 768, "initializer_range": 0.02, "input_bits": 8, "intermediate_size": 2048, "max_position_embeddings": 1024, "model_type": "llama", "num_attention_heads": 12, "num_hidden_layers": 12, "num_key_value_heads": 12, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.38.1", "use_cache": true, "vocab_size": 50304, "weight_bits": 1 }

Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:24:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database tokens per iteration will be: 524,288 Initializing a new model from scratch defaulting to vocab_size of GPT-2 to 50304 (50257 rounded up for efficiency) zzt:BitnetConfig { "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 768, "initializer_range": 0.02, "input_bits": 8, "intermediate_size": 2048, "max_position_embeddings": 1024, "model_type": "llama", "num_attention_heads": 12, "num_hidden_layers": 12, "num_key_value_heads": 12, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.38.1", "use_cache": true, "vocab_size": 50304, "weight_bits": 1 }

Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) tokens per iteration will be: 524,288 Initializing a new model from scratch defaulting to vocab_size of GPT-2 to 50304 (50257 rounded up for efficiency) tokens per iteration will be: 524,288 zzt:BitnetConfig { "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 768, "initializer_range": 0.02, "input_bits": 8, "intermediate_size": 2048, "max_position_embeddings": 1024, "model_type": "llama", "num_attention_heads": 12, "num_hidden_layers": 12, "num_key_value_heads": 12, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.38.1", "use_cache": true, "vocab_size": 50304, "weight_bits": 1 }

Initializing a new model from scratch defaulting to vocab_size of GPT-2 to 50304 (50257 rounded up for efficiency) zzt:BitnetConfig { "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 768, "initializer_range": 0.02, "input_bits": 8, "intermediate_size": 2048, "max_position_embeddings": 1024, "model_type": "llama", "num_attention_heads": 12, "num_hidden_layers": 12, "num_key_value_heads": 12, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.38.1", "use_cache": true, "vocab_size": 50304, "weight_bits": 1 }

tokens per iteration will be: 524,288 Initializing a new model from scratch defaulting to vocab_size of GPT-2 to 50304 (50257 rounded up for efficiency) zzt:BitnetConfig { "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 768, "initializer_range": 0.02, "input_bits": 8, "intermediate_size": 2048, "max_position_embeddings": 1024, "model_type": "llama", "num_attention_heads": 12, "num_hidden_layers": 12, "num_key_value_heads": 12, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.38.1", "use_cache": true, "vocab_size": 50304, "weight_bits": 1 }

tokens per iteration will be: 524,288 tokens per iteration will be: 524,288 Initializing a new model from scratch defaulting to vocab_size of GPT-2 to 50304 (50257 rounded up for efficiency) Initializing a new model from scratch defaulting to vocab_size of GPT-2 to 50304 (50257 rounded up for efficiency) zzt:BitnetConfig { "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 768, "initializer_range": 0.02, "input_bits": 8, "intermediate_size": 2048, "max_position_embeddings": 1024, "model_type": "llama", "num_attention_heads": 12, "num_hidden_layers": 12, "num_key_value_heads": 12, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.38.1", "use_cache": true, "vocab_size": 50304, "weight_bits": 1 }

zzt:BitnetConfig { "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 768, "initializer_range": 0.02, "input_bits": 8, "intermediate_size": 2048, "max_position_embeddings": 1024, "model_type": "llama", "num_attention_heads": 12, "num_hidden_layers": 12, "num_key_value_heads": 12, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.38.1", "use_cache": true, "vocab_size": 50304, "weight_bits": 1 }

Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:24:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in BitnetModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16) zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:24:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:24:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:24:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:24:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:24:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:24:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:18 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:18 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:18 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:20 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:20 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:20 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:20 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:21 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:21 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:23 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 BitBLAS Operator created. 2024-05-23 06:25:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database zzt-BitnetFlashAttention2 2024-05-23 06:25:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:27 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:25:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:29 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:29 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:31 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:31 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:31 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:31 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:32 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:32 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:32 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:34 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:34 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:36 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:25:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:40 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 2024-05-23 06:25:40 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:40 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:42 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:42 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:42 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:44 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:44 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:44 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:44 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt:flash_attention_2 zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:25:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:47 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:47 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:25:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:25:49 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:49 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:25:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:51 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:25:51 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:51 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:53 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:25:53 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:53 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:55 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:55 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:25:55 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:56 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:25:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:25:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:58 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:25:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:25:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:25:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:25:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:00 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:02 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:02 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:03 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:05 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:07 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:09 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 2024-05-23 06:26:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 2024-05-23 06:26:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:11 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created.2024-05-23 06:26:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database

BitBLAS Operator created. 2024-05-23 06:26:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:13 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:14 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:14 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:16 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:16 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:18 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:20 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:21 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:21 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:21 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:21 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:21 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:21 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:21 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:23 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:23 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:23 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:23 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:23 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:23 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:23 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 2024-05-23 06:26:25 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:25 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 2024-05-23 06:26:25 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:25 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 2024-05-23 06:26:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:27 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:27 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:29 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:31 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:32 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:32 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:32 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:32 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:32 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:32 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:32 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:34 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:34 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:34 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:34 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:34 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:34 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:34 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:36 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:36 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:36 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:36 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:36 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:38 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:38 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:38 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:38 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:40 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 2024-05-23 06:26:40 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:40 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:42 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:44 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:47 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:47 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:47 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:47 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:47 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:26:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:49 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:49 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:49 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:49 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:51 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:51 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:26:51 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:26:51 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:53 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:53 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:55 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:56 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:56 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:56 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:56 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:56 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:56 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:26:56 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:58 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:58 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:58 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:58 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:58 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:58 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:26:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:00 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:00 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:00 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:00 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:00 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:02 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:02 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:02 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:02 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:03 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:03 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:03 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:03 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:05 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:05 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:05 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:05 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:07 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:07 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:07 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:07 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:07 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:09 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:09 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:09 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:09 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:09 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:09 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:09 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:11 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:11 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 2024-05-23 06:27:11 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:11 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:11 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:11 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created.BitBLAS Operator created.

2024-05-23 06:27:13 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:13 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:13 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:13 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:13 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:14 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:14 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:14 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:16 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:16 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:16 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:16 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:18 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:18 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:18 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:18 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:19 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:20 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:20 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:20 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:20 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:20 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:21 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:22 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:23 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:23 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:23 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:24 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:25 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:25 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:25 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 2024-05-23 06:27:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:26 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:27 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:27 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:27 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:27 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:28 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:29 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:29 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:29 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:29 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:29 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:29 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:30 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:31 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:31 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:31 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:31 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:31 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:32 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:33 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:34 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:35 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:36 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:36 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:36 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:37 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:38 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:38 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:38 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:39 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:40 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:40 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:40 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:40 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:40 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created.BitBLAS Operator created.

zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 2024-05-23 06:27:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:41 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:42 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:42 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:42 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:42 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:43 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:44 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:44 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:44 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:44 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:44 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:44 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:45 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:46 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:47 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:47 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:47 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:48 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:49 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:27:49 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:49 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:50 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:51 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:51 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:51 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:51 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:52 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:53 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:53 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:53 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:53 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:27:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:27:54 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:55 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:55 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:55 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:55 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:55 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:56 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:57 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:27:58 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:58 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:27:59 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:00 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:00 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:01 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:02 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:02 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:02 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:28:02 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:28:03 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:28:03 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:28:03 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:28:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:28:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:28:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt:flash_attention_2 zzt-BitnetAttention-head_dim:64 zzt-BitnetAttention-hidden_size:768 zzt-BitnetAttention-num_heads:12 2024-05-23 06:28:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:04 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:28:05 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:28:05 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:28:05 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:06 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:07 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:07 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:28:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:28:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:08 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:28:09 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:28:09 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created.BitBLAS Operator created.

BitBLAS Operator created. 2024-05-23 06:28:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:28:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:28:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:10 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. 2024-05-23 06:28:11 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database zzt-BitnetFlashAttention2 2024-05-23 06:28:11 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:28:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. zzt-BitnetFlashAttention2 zzt-BitnetFlashAttention2 2024-05-23 06:28:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database 2024-05-23 06:28:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:28:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:28:12 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:28:13 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. zzt-BitnetFlashAttention2 2024-05-23 06:28:13 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:13 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:14 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:14 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:14 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:15 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:16 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:16 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:16 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. 2024-05-23 06:28:17 [BitBLAS:INFO]: Database path /home/notebook/code/personal/80306170/AGI/BitNet/cache/bitblas does not exist, skipping loading operators from the database BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. BitBLAS Operator created. number of parameters: 123.62M BitBLAS Operator created. num decayed parameter tensors: 86, with 162,201,600 parameters num non-decayed parameter tensors: 49, with 52,992 parameters using fused AdamW: True BitBLAS Operator created. BitBLAS Operator created. number of parameters: 123.62M BitBLAS Operator created. number of parameters: 123.62M num decayed parameter tensors: 86, with 162,201,600 parameters num non-decayed parameter tensors: 49, with 52,992 parameters using fused AdamW: True number of parameters: 123.62M num decayed parameter tensors: 86, with 162,201,600 parameters num non-decayed parameter tensors: 49, with 52,992 parameters using fused AdamW: True num decayed parameter tensors: 86, with 162,201,600 parameters num non-decayed parameter tensors: 49, with 52,992 parameters using fused AdamW: True number of parameters: 123.62M number of parameters: 123.62M num decayed parameter tensors: 86, with 162,201,600 parameters num non-decayed parameter tensors: 49, with 52,992 parameters using fused AdamW: True num decayed parameter tensors: 86, with 162,201,600 parameters num non-decayed parameter tensors: 49, with 52,992 parameters number of parameters: 123.62M using fused AdamW: True num decayed parameter tensors: 86, with 162,201,600 parameters num non-decayed parameter tensors: 49, with 52,992 parameters using fused AdamW: True number of parameters: 123.62M num decayed parameter tensors: 86, with 162,201,600 parameters num non-decayed parameter tensors: 49, with 52,992 parameters using fused AdamW: True wandb: Currently logged in as: robotzheng. Use wandb login --relogin to force relogin wandb: wandb version 0.17.0 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.16.3 wandb: Run data is saved locally in /home/notebook/code/personal/80306170/AGI/BitNet/scaling_laws_bitnet/wandb/run-20240523_062827-qa8l01al wandb: Run wandb offline to turn off syncing. wandb: Syncing run N-8.49e+07 wandb: ⭐️ View project at https://wandb.ai/robotzheng/owt-scaling wandb: 🚀 View run at https://wandb.ai/robotzheng/owt-scaling/runs/qa8l01al step 0: train loss nan, val loss nan iter 0: loss nan, time 72417.82ms, mfu -100.00% iter 10: loss nan, time 1514.87ms, mfu 11.86% iter 20: loss nan, time 1514.67ms, mfu 11.86% iter 30: loss nan, time 1514.11ms, mfu 11.86% iter 40: loss nan, time 1514.20ms, mfu 11.86% iter 50: loss nan, time 1514.77ms, mfu 11.86% iter 60: loss nan, time 1515.29ms, mfu 11.86% iter 70: loss nan, time 1515.12ms, mfu 11.86% iter 80: loss nan, time 1514.51ms, mfu 11.86% iter 90: loss nan, time 1515.22ms, mfu 11.86% iter 100: loss nan, time 1514.65ms, mfu 11.86% iter 110: loss nan, time 1514.52ms, mfu 11.86% iter 120: loss nan, time 1514.85ms, mfu 11.86% iter 130: loss nan, time 1514.39ms, mfu 11.86% iter 140: loss nan, time 1514.85ms, mfu 11.86% iter 150: loss nan, time 1514.49ms, mfu 11.86%

train.py:

""" This training script can be run both on a single gpu in debug mode, and also in a larger training run with distributed data parallel (ddp).

To run on a single GPU, example: $ python train.py --batch_size=32 --compile=False

To run with DDP on 4 gpus on 1 node, example: $ torchrun --standalone --nproc_per_node=4 train.py

To run with DDP on 4 gpus across 2 nodes, example:

import os import time import math import pickle from contextlib import nullcontext

import numpy as np import torch from torch.nn.parallel import DistributedDataParallel as DDP from torch.distributed import init_process_group, destroy_process_group import torch._dynamo torch._dynamo.config.suppress_errors = True torch._dynamo.config.cache_size_limit = 64

from model import GPTConfig, GPT from modeling_bitnet import BitnetForCausalLM from tokenization_bitnet import BitnetTokenizer from configuration_bitnet import BitnetConfig

-----------------------------------------------------------------------------

default config values designed to train a gpt2 (124M) on OpenWebText

I/O

out_dir = 'out' eval_interval = 2000 log_interval = 1 eval_iters = 200 eval_only = False # if True, script exits right after the first eval always_save_checkpoint = True # if True, always save a checkpoint after each eval init_from = 'scratch' # 'scratch' or 'resume' or 'gpt2*'

wandb logging

wandb_log = False # disabled by default wandb_project = 'owt' wandb_run_name = 'gpt2' # 'run' + str(time.time())

data

dataset = 'openwebtext' gradient_accumulation_steps = 5 * 8 # used to simulate larger batch sizes batch_size = 1 # if gradient_accumulation_steps > 1, this is the micro-batch size block_size = 1024

model

n_layer = 12 n_head = 12 n_embd = 768 n_intermediate_size = 2048 dropout = 0.0 # for pretraining 0 is good, for finetuning try 0.1+ bias = False # do we use bias inside LayerNorm and Linear layers?

adamw optimizer

learning_rate = 6e-3 # max learning rate max_iters = 600000 # total number of training iterations weight_decay = 1e-1 beta1 = 0.9 beta2 = 0.95 grad_clip = 1.0 # clip gradients at this value, or disable if == 0.0

learning rate decay settings

decay_lr = True # whether to decay the learning rate warmup_iters = 2000 # how many steps to warm up for lr_decay_iters = 600000 # should be ~= max_iters per Chinchilla min_lr = 0 # minimum learning rate, should be ~= learning_rate/10 per Chinchilla

DDP settings

backend = 'nccl' # 'nccl', 'gloo', etc.

system

device = 'cuda' # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1' etc., or try 'mps' on macbooks dtype = 'bfloat16' if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else 'float16' # 'float32', 'bfloat16', or 'float16', the latter will auto implement a GradScaler compile = False # use PyTorch 2.0 to compile the model to be faster

variables needed for scaling laws

scaling = "" # takes 4 values: Kaplan, Chinchilla-1, Chinchilla-2 or '' (default, when not scaling). scale_N = False scale_D = False estimate_B_crit = False N = 12 n_layer n_embd*2 # number of non-embedding parameters fraction_of_data = 1.0 # fraction of OWT dataset that will be used for training D = int(fraction_of_data9035582198) # number of OWT dataset tokens wandb_run_id = "" # needed only if resuming a W&B run.

-----------------------------------------------------------------------------

configkeys = [k for k,v in globals().items() if not k.startswith('') and isinstance(v, (int, float, bool, str))] print(f'zzt-learning_rate2:{learning_rate}') exec(open('configurator.py').read()) # overrides from command line or config file print(f'zzt-learning_rate3:{learning_rate}') config = {k: globals()[k] for k in config_keys} # will be useful for logging

-----------------------------------------------------------------------------

various inits, derived attributes, I/O setup

ddp = int(os.environ.get('RANK', -1)) != -1 # is this a ddp run? if ddp: init_process_group(backend=backend) ddp_rank = int(os.environ['RANK']) ddp_local_rank = int(os.environ['LOCAL_RANK']) ddp_world_size = int(os.environ['WORLD_SIZE']) device = f'cuda:{ddp_local_rank}' torch.cuda.set_device(device) master_process = ddp_rank == 0 # this process will do logging, checkpointing etc. seed_offset = ddp_rank # each process gets a different seed

world_size number of processes will be training simultaneously, so we can scale

# down the desired gradient accumulation iterations per process proportionally
assert gradient_accumulation_steps % ddp_world_size == 0
gradient_accumulation_steps //= ddp_world_size

else:

if not ddp, we are running on a single gpu, and one process

master_process = True
seed_offset = 0
ddp_world_size = 1

tokens_per_iter = gradient_accumulation_steps ddp_world_size batch_size * block_size print(f"tokens per iteration will be: {tokens_per_iter:,}")

if master_process: os.makedirs(out_dir, exist_ok=True) torch.manual_seed(1337 + seed_offset) torch.backends.cuda.matmul.allow_tf32 = True # allow tf32 on matmul torch.backends.cudnn.allow_tf32 = True # allow tf32 on cudnn device_type = 'cuda' if 'cuda' in device else 'cpu' # for later use in torch.autocast

note: float16 data type will automatically use a GradScaler

ptdtype = {'float32': torch.float32, 'bfloat16': torch.bfloat16, 'float16': torch.float16}[dtype] ctx = nullcontext() if device_type == 'cpu' else torch.amp.autocast(device_type=device_type, dtype=ptdtype)

poor man's data loader

data_dir = os.path.join('data', dataset) train_data = np.memmap(os.path.join(data_dir, 'train.bin'), dtype=np.uint16, mode='r') if scaling == 'Kaplan' and scale_D: # use a fraction of dataset if scaling with dataset size following Kaplan et al

train_data = train_data[:D] zzt

print(f"Using {fraction_of_data} fraction of owt training data; number of tokens: {D:.2e}")

val_data = np.memmap(os.path.join(data_dir, 'val.bin'), dtype=np.uint16, mode='r') def get_batch(split):

print(f'zzt-batch_size:{batch_size}')

data = train_data if split == 'train' else val_data
ix = torch.randint(len(data) - block_size, (batch_size,))
x = torch.stack([torch.from_numpy((data[i:i+block_size]).astype(np.int64)) for i in ix])
y = torch.stack([torch.from_numpy((data[i+1:i+1+block_size]).astype(np.int64)) for i in ix])
if device_type == 'cuda':
    # pin arrays x,y, which allows us to move them to GPU asynchronously (non_blocking=True)
    x, y = x.pin_memory().to(device, non_blocking=True), y.pin_memory().to(device, non_blocking=True)
else:
    x, y = x.to(device), y.to(device)
return x, y

init these up here, can override if init_from='resume' (i.e. from a checkpoint)

iter_num = 0 best_val_loss = 1e9

attempt to derive vocab_size from the dataset

meta_path = os.path.join(data_dir, 'meta.pkl') meta_vocab_size = None if os.path.exists(meta_path): with open(meta_path, 'rb') as f: meta = pickle.load(f) meta_vocab_size = meta['vocab_size'] print(f"found vocab_size = {meta_vocab_size} (inside {meta_path})")

model init

model_args = dict(num_hidden_layers=n_layer, num_attention_heads=n_head, hidden_size=n_embd, intermediate_size=n_intermediate_size, max_position_embeddings=block_size, attention_bias=bias, vocab_size=None, attention_dropout=dropout) # start with model_args from command line if init_from == 'scratch':

init a new model from scratch

print("Initializing a new model from scratch")
# determine the vocab size we'll use for from-scratch training
if meta_vocab_size is None:
    print("defaulting to vocab_size of GPT-2 to 50304 (50257 rounded up for efficiency)")
model_args['vocab_size'] = meta_vocab_size if meta_vocab_size is not None else 50304
gptconf = BitnetConfig(**model_args)
gptconf._attn_implementation = "flash_attention_2"
gptconf.torch_dtype=torch.bfloat16
print(f'zzt:{gptconf}')
model = BitnetForCausalLM(gptconf)
model = model.to(torch.bfloat16)

elif init_from == 'resume': print(f"Resuming training from {out_dir}")

resume training from a checkpoint.

ckpt_path = os.path.join(out_dir, 'ckpt.pt')
checkpoint = torch.load(ckpt_path, map_location=device)
checkpoint_model_args = checkpoint['model_args']
# force these config attributes to be equal otherwise we can't even resume training
# the rest of the attributes (e.g. dropout) can stay as desired from command line
for k in ['n_layer', 'n_head', 'n_embd', 'block_size', 'bias', 'vocab_size']:
    model_args[k] = checkpoint_model_args[k]
# create the model
gptconf = GPTConfig(**model_args)
model = GPT(gptconf)
state_dict = checkpoint['model']
# fix the keys of the state dictionary :(
# honestly no idea how checkpoints sometimes get this prefix, have to debug more
unwanted_prefix = '_orig_mod.'
for k,v in list(state_dict.items()):
    if k.startswith(unwanted_prefix):
        state_dict[k[len(unwanted_prefix):]] = state_dict.pop(k)
model.load_state_dict(state_dict)
iter_num = checkpoint['iter_num']
best_val_loss = checkpoint['best_val_loss']

elif init_from.startswith('gpt2'): print(f"Initializing from OpenAI GPT-2 weights: {init_from}")

initialize from OpenAI GPT-2 weights

override_args = dict(dropout=dropout)
model = GPT.from_pretrained(init_from, override_args)
# read off the created config params, so we can store them into checkpoint correctly
for k in ['n_layer', 'n_head', 'n_embd', 'block_size', 'bias', 'vocab_size']:
    model_args[k] = getattr(model.config, k)

crop down the model block size if desired, using model surgery

if block_size < model.config.max_position_embeddings: model.crop_block_size(block_size) model_args['max_position_embeddings'] = block_size # so that the checkpoint will have the right value model.to(device)

initialize a GradScaler. If enabled=False scaler is a no-op

scaler = torch.cuda.amp.GradScaler(enabled=(dtype == 'float16'))

optimizer

optimizer = model.configure_optimizers(weight_decay, learning_rate, (beta1, beta2), device_type) if init_from == 'resume': optimizer.load_state_dict(checkpoint['optimizer']) checkpoint = None # free up memory

compile the model

if compile: print("compiling the model... (takes a ~minute)") unoptimized_model = model model = torch.compile(model) # requires PyTorch 2.0

wrap model into DDP container

if ddp: model = DDP(model, device_ids=[ddp_local_rank], find_unused_parameters=True)

helps estimate an arbitrarily accurate loss over either split using many batches

@torch.no_grad() def estimate_loss(): out = {} model.eval() for split in ['train', 'val']: losses = torch.zeros(eval_iters) for k in range(eval_iters): X, Y = get_batch(split) with ctx:

print('zzt:ctx')

            logits, loss = model(input_ids=X, labels=Y)
        #print(f'zzt:{loss}')
        losses[k] = loss.item()
    out[split] = losses.mean()
model.train()
return out

learning rate decay scheduler (cosine with warmup)

def get_lr(it):

1) linear warmup for warmup_iters steps

if it < warmup_iters:
    return learning_rate * it / warmup_iters
# 2) if it > lr_decay_iters, return min learning rate
if it > lr_decay_iters:
    return min_lr
# 3) in between, use cosine decay down to min learning rate
decay_ratio = (it - warmup_iters) / (lr_decay_iters - warmup_iters)
assert 0 <= decay_ratio <= 1
coeff = 0.5 * (1.0 + math.cos(math.pi * decay_ratio)) # coeff ranges 0..1
return min_lr + coeff * (learning_rate - min_lr)

logging

if wandb_log and master_process: import wandb if wandb_run_id: # resume a previous run with id=wandb_run_id wandb.init(project=wandb_project, id=wandb_run_id, resume="must") else: wandb.init(project=wandb_project, name=wandb_run_name, config=config)

training loop

X, Y = get_batch('train') # fetch the very first batch t0 = time.time() local_iter_num = 0 # number of iterations in the lifetime of this process raw_model = model.module if ddp else model # unwrap DDP container if needed running_mfu = -1.0 while True:

# determine and set the learning rate for this iteration
lr = get_lr(iter_num) if decay_lr else learning_rate
for param_group in optimizer.param_groups:
    param_group['lr'] = lr

# evaluate the loss on train/val sets and write checkpoints
# except when estimating critical batch size or reproducing Chinchilla scaling laws as they work with (smoothed) training loss
if iter_num % eval_interval == 0 and master_process and not (estimate_B_crit or scale_D == 'Chinchilla'):
    losses = estimate_loss()
    print(f"step {iter_num}: train loss {losses['train']:.4f}, val loss {losses['val']:.4f}")
    if wandb_log:
        wandb.log({
            "iter": iter_num,
            "train/loss": losses['train'],
            "val/loss": losses['val'],
            "lr": lr,
            "mfu": running_mfu*100, # convert to percentage
        })
    if losses['val'] < best_val_loss or always_save_checkpoint:
        best_val_loss = losses['val']
        if iter_num > 0:
            checkpoint = {
                'model': raw_model.state_dict(),
                'optimizer': optimizer.state_dict(),
                'model_args': model_args,
                'iter_num': iter_num,
                'best_val_loss': best_val_loss,
                'config': config,
            }
            print(f"saving checkpoint to {out_dir}")
            torch.save(checkpoint, os.path.join(out_dir, f'ckpt{iter_num}.pt'))
if iter_num == 0 and eval_only:
    break

# forward backward update, with optional gradient accumulation to simulate larger batch size
# and using the GradScaler if data type is float16
for micro_step in range(gradient_accumulation_steps):
    if ddp:
        # in DDP training we only need to sync gradients at the last micro step.
        # the official way to do this is with model.no_sync() context manager, but
        # I really dislike that this bloats the code and forces us to repeat code
        # looking at the source of that context manager, it just toggles this variable
        model.require_backward_grad_sync = (micro_step == gradient_accumulation_steps - 1)
    with ctx:
        logits, loss = model(input_ids=X, labels=Y)
        loss = loss / gradient_accumulation_steps # scale the loss to account for gradient accumulation
    # immediately async prefetch next batch while model is doing the forward pass on the GPU
    X, Y = get_batch('train')
    # backward pass, with gradient scaling if training in fp16
    scaler.scale(loss).backward()
# clip the gradient
if grad_clip != 0.0:
    scaler.unscale_(optimizer)
    torch.nn.utils.clip_grad_norm_(model.parameters(), grad_clip)
# step the optimizer and scaler if training in fp16
scaler.step(optimizer)
scaler.update()
# flush the gradients as soon as we can, no need for this memory anymore
optimizer.zero_grad(set_to_none=True)

# timing and logging
t1 = time.time()
dt = t1 - t0
t0 = t1
if iter_num % log_interval == 0 and master_process:
    # get loss as float. note: this is a CPU-GPU sync point
    # scale up to undo the division above, approximating the true total loss (exact would have been a sum)
    lossf = loss.item() * gradient_accumulation_steps
    if local_iter_num >= 5: # let the training loop settle a bit
        mfu = raw_model.estimate_mfu(batch_size * gradient_accumulation_steps, dt)
        running_mfu = mfu if running_mfu == -1.0 else 0.9*running_mfu + 0.1*mfu
    print(f"iter {iter_num}: loss {lossf:.4f}, time {dt*1000:.2f}ms, mfu {running_mfu*100:.2f}%")

    if (estimate_B_crit or scale_D == 'Chinchilla') and wandb_log:
        wandb.log({
            "iter": iter_num,
            "iter_loss": lossf,
            "lr": lr,
        })
iter_num += 1
local_iter_num += 1

# termination conditions
if iter_num > max_iters:
    break

if ddp: destroy_process_group()

robotzheng commented 1 month ago

Please, help me! Thanks.

robotzheng commented 1 month ago

we use GPT2 tokenizer, will this lead to ’nan‘? flashattention2

xysmlx commented 1 month ago

Hi, currently, BitBLAS does not implement the backward computation, and cannot be used in training if there is no additional backward implementation. Therefore, gradients will not be processed and may result in NaN. You may try torch.nn.Linear or torch.matmul to see training results.

robotzheng commented 1 month ago

Thanks a lot.

LeiWang1999 commented 3 weeks ago

closed as the question has been answered.