ngruver / llmtime
MIT License
672 stars 157 forks source link

text-davinci-003 has been deprecated && the results of demo are not good #30

Open Lins-01 opened 6 months ago

Lins-01 commented 6 months ago

Hi! Thank you for releasing the code! This is a very interesting piece of work. Congratsssss on the NeurIPS acceptance! 🎉

i met some problem when i use your code. when directly run the demo.ipynb ,error here.

Sampling with best hyper... defaultdict(<class 'dict'>, {'model': 'text-davinci-003', 'temp': 0.7, 'alpha': 0.95, 'beta': 0.3, 'basic': False, 'settings': SerializerSettings(base=10, prec=3, signed=True, fixed_length=False, max_val=10000000.0, time_sep=' ,', bit_sep=' ', plus_sign='', minus_sign=' -', half_bin_correction=True, decimal_point='', missing_str=' Nan'), 'dataset_name': 'AirPassengersDataset'}) 
 with NLL inf
  0%|          | 0/1 [00:00<?, ?it/s]
InvalidRequestError                       Traceback (most recent call last)
Cell In[3], line 11
      9 hypers = list(grid_iter(model_hypers[model]))
     10 num_samples = 10
---> 11 pred_dict = get_autotuned_predictions_data(train, test, hypers, num_samples, model_predict_fns[model], verbose=False, parallel=False)
     12 out[model] = pred_dict
     13 plot_preds(train, test, pred_dict, model, show_samples=True)

File [e:\Document\CodeSpace\OpenProject\llmtime-main\models\](file:///E:/Document/CodeSpace/OpenProject/llmtime-main/models/, in get_autotuned_predictions_data(train, test, hypers, num_samples, get_predictions_fn, verbose, parallel, n_train, n_val)
    117     best_val_nll = float('inf')
    118 print(f'Sampling with best hyper... {best_hyper} \n with NLL {best_val_nll:3f}')
--> 119 out = get_predictions_fn(train, test, **best_hyper, num_samples=num_samples, n_train=n_train, parallel=parallel)
    120 out['best_hyper']=convert_to_dict(best_hyper)
    121 return out

File [e:\Document\CodeSpace\OpenProject\llmtime-main\models\](file:///E:/Document/CodeSpace/OpenProject/llmtime-main/models/, in get_llmtime_predictions_data(train, test, model, settings, num_samples, temp, alpha, beta, basic, parallel, **kwargs)
    226 completions_list = None
    227 if num_samples > 0:
--> 228     preds, completions_list, input_strs = generate_predictions(completion_fn, input_strs, steps, settings, scalers,
    229                                                                 num_samples=num_samples, temp=temp, 
    230                                                                 parallel=parallel, **kwargs)
    231     samples = [pd.DataFrame(preds[i], columns=test[i].index) for i in range(len(preds))]
    232     medians = [sample.median(axis=0) for sample in samples]
    776         rbody, rcode,, rheaders, stream_error=stream_error
    777     )
    778 return resp

InvalidRequestError: The model `text-davinci-003` has been deprecated, learn more here:
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?3b9460ae-8b25-48ef-a914-d7f7efda15e9) or open in a [text editor](command:workbench.action.openLargeOutput?3b9460ae-8b25-48ef-a914-d7f7efda15e9). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...

after check the openai's url, change this code:

model_predict_fns = {
    'LLMTime GPT-3': get_llmtime_predictions_data,
    'LLMTime GPT-4': get_llmtime_predictions_data,
    'PromptCast GPT-3': get_promptcast_predictions_data,
    'ARIMA': get_arima_predictions_data,


model_predict_fns = {
    'LLMTime GPT-3.5': get_llmtime_predictions_data,
    # 'LLMTime GPT-4': get_llmtime_predictions_data,
    # 'PromptCast GPT-3': get_promptcast_predictions_data,
    'ARIMA': get_arima_predictions_data,

here is the result i get,seem doesnt better than ARIMA,the bold purple line is farer from the actual, is the reason of gpt-3.5-turbo-instruct? plz,can you update the demo for new api,or instruct me how to improve the performance?or only use the text-davinci-003 or llama-70B to get the result plot in your paper? Sorry for taking up your time. Can you give me some help in your free time? :

gpt3 51 ARIMA1 3 52 ARIMA2

shikaiqiu commented 4 months ago

Hi Changling,

It's indeed unfortunate that OpenAI has deprecated text-davinci-003. As mentioned in the README, we found gpt-3.5-turbo-instruct to perform worse than text-davinci-003. We found using a lower temperature (e.g. 0.3) improved performance slightly but still not matching text-davinci-003. Therefore, we do not recommend using gpt-3.5-turbo-instruct as a drop-in replacement. Using other models such as LLaMA 2 will work much better.
