oneapi-src / oneAPI-samples

Samples for Intel® oneAPI Toolkits
https://oneapi-src.github.io/oneAPI-samples/
MIT License
944 stars 689 forks source link

Failing to test "IntelTensorFlow_for_LLMs" sample in CI #2445

Open Ankur-singh opened 2 months ago

Ankur-singh commented 2 months ago

Summary

Provide a short summary of the issue. Sections below provide guidance on what factors are considered important to reproduce an issue.

The "IntelTensorFlow_for_LLMs" sample takes ~5hrs to run on CI. Hence, the sample timeouts in CI.

Environment

OS: Linux

Observed behavior

The sample shows how to finetune a 6B model, which takes ~5hrs on CPU. This makes it hard to test the sample on CI.

Expected behavior

Ideally, the sample should not take more than few minutes to run. We can use environment variable to check if the sample is running on CI and run it for a few batches in CI. This would be more than enough to test the correctness of the code sample.