ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.22k stars 1.19k forks source link

Need more error info #3791

Closed yogeshhk closed 1 year ago

yogeshhk commented 1 year ago

Describe the bug For many usecases such as fine-tuning LLM, it errors out without giving any error information Process finished with exit code -1073741819 (0xC0000005)

To Reproduce Code is

qna_tuning_config_dict = {
    "input_features": [
        {
            "name": "Question",
            "type": "text",
            "encoder": {
                "type": "auto_transformer",
                "pretrained_model_name_or_path": "meta-llama/Llama-2-7b-hf",
                "trainable": False
            },
            "preprocessing": {
                "cache_encoder_embeddings": True
            }
        }
    ],
    "output_features": [
        {
            "name": "Answer",
            "type": "text"
        }
    ]
}

df = pd.read_csv('./data/cbic-gst_gov_in_fgaq.csv', encoding='cp1252')
model = LudwigModel(config=qna_tuning_config_dict, logging_level=logging.DEBUG)
_ = model.train(dataset=df,output_directory="results")
model_dir = "./models/gst_qna"
model.save(model_dir)

test_df = pd.DataFrame([
    {
        "Question": "What is GST?"
    },
    {
        "Question": "Does aggregate turnover include value of inward supplies received on which RCM is payable?"
    },
])
model = LudwigModel.load(model_dir)
results = model.predict(dataset=test_df)
print(results)

cbic-gst_gov_in_fgaq.csv

Intent of this code is to fine-tune Llama 2 using Question-Answer dataset.

Expected behavior With logging.DEBUG level need lots of information where the error has actually occurred.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context Full error log below


╒════════════════════════╕
│ EXPERIMENT DESCRIPTION │
╘════════════════════════╛

╒══════════════════╤══════════════════════════════════════════════════════════════════════════════════════╕
│ Experiment name  │ api_experiment                                                                       │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ Model name       │ run                                                                                  │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ Output directory │ D:\Yogesh\GitHub\Sarvadnya\src\ludwig\results\api_experiment_run_12                  │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ ludwig_version   │ '0.8.6'                                                                              │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ command          │ 'D:\\Yogesh\\GitHub\\Sarvadnya\\src\\ludwig\\finetuning_gst_llm.py'                  │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ commit_hash      │ '49ca4a03e17f'                                                                       │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ random_seed      │ 42                                                                                   │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ data_format      │ "<class 'pandas.core.frame.DataFrame'>"                                              │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ torch_version    │ '2.1.1+cu118'                                                                        │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ compute          │ {   'arch_list': [   'sm_37',                                                        │
│                  │                      'sm_50',                                                        │
│                  │                      'sm_60',                                                        │
│                  │                      'sm_61',                                                        │
│                  │                      'sm_70',                                                        │
│                  │                      'sm_75',                                                        │
│                  │                      'sm_80',                                                        │
│                  │                      'sm_86',                                                        │
│                  │                      'sm_90',                                                        │
│                  │                      'compute_37'],                                                  │
│                  │     'devices': {   0: {   'device_capability': (8, 6),                               │
│                  │                           'device_properties': "_CudaDeviceProperties(name='NVIDIA " │
│                  │                                                "GeForce MX570 A', major=8, "         │
│                  │                                                'minor=6, total_memory=2047MB, '      │
│                  │                                                'multi_processor_count=16)',          │
│                  │                           'gpu_type': 'NVIDIA GeForce MX570 A'}},                    │
│                  │     'gencode_flags': '-gencode compute=compute_37,code=sm_37 -gencode '              │
│                  │                      'compute=compute_50,code=sm_50 -gencode '                       │
│                  │                      'compute=compute_60,code=sm_60 -gencode '                       │
│                  │                      'compute=compute_61,code=sm_61 -gencode '                       │
│                  │                      'compute=compute_70,code=sm_70 -gencode '                       │
│                  │                      'compute=compute_75,code=sm_75 -gencode '                       │
│                  │                      'compute=compute_80,code=sm_80 -gencode '                       │
│                  │                      'compute=compute_86,code=sm_86 -gencode '                       │
│                  │                      'compute=compute_90,code=sm_90 -gencode '                       │
│                  │                      'compute=compute_37,code=compute_37',                           │
│                  │     'gpus_per_node': 1,                                                              │
│                  │     'num_nodes': 1}                                                                  │
╘══════════════════╧══════════════════════════════════════════════════════════════════════════════════════╛

╒═══════════════╕
│ LUDWIG CONFIG │
╘═══════════════╛

User-specified config (with upgrades):

{   'input_features': [   {   'encoder': {   'pretrained_model_name_or_path': 'meta-llama/Llama-2-7b-hf',
                                             'trainable': False,
                                             'type': 'auto_transformer'},
                              'name': 'Question',
                              'preprocessing': {   'cache_encoder_embeddings': True},
                              'type': 'text'}],
    'ludwig_version': '0.8.6',
    'output_features': [{'name': 'Answer', 'type': 'text'}]}

Full config saved to:
D:\Yogesh\GitHub\Sarvadnya\src\ludwig\results\api_experiment_run_12/api_experiment/model/model_hyperparameters.json

╒═══════════════╕
│ PREPROCESSING │
╘═══════════════╛

No cached dataset found at D:\Yogesh\GitHub\Sarvadnya\src\ludwig\ff5c5e2a8dd211eea4a298597a3a7c43.training.hdf5. Preprocessing the dataset.
Using full dataframe
Building dataset (it may take a while)
handle text features with prompt parameters
build preprocessing parameters
handle missing values
cast columns
build metadata
Loaded HuggingFace implementation of meta-llama/Llama-2-7b-hf tokenizer
No padding token id found. Using eos_token as pad_token.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Max length of feature 'Question': 61 (without start and stop symbols)
Setting max length using dataset: 63 (including start and stop symbols)
max sequence length is 63 for feature 'Question'
Max length of feature 'Answer': 74 (without start and stop symbols)
Setting max length using dataset: 76 (including start and stop symbols)
max sequence length is 76 for feature 'Answer'
build data
Loaded HuggingFace implementation of meta-llama/Llama-2-7b-hf tokenizer
No padding token id found. Using eos_token as pad_token.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
max length of <built-in function format>: 61 < limit: 63
Cache encoder embeddings for features: ['Question']
Input text feature Question
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Process finished with exit code -1073741819 (0xC0000005)
justinxzhao commented 1 year ago

Hi @yogeshhk,

For LLM-based text generation models, the ECD architecture should not be used.

Please refer to https://ludwig.ai/latest/examples/llms/llm_text_generation/ for training/fine-tuning LLMs.