philschmid / llm-sagemaker-sample

Apache License 2.0
37 stars 17 forks source link

not enforcing datasets version leads to conflicts #13

Open AnisZakari opened 4 months ago

AnisZakari commented 4 months ago

Hey Philipp, thank you for your amazing work !

In the train-deploy-llm.ipynb notebook, when running huggingface_estimator.fit(data, wait=True) I just noticed that if we don't enforce datasets version it leads to conflicts during the installation of the requirements.txt:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tokenizers 0.14.1 requires huggingface_hub<0.18,>=0.16.4, but you have huggingface-hub 0.20.3 which is incompatible.

what I suggest is enforcing datasets version in the requirements.txt:

# Requirements.txt
transformers==4.34.0
datasets==2.14.0
peft==0.4.0
accelerate==0.23.0
bitsandbytes==0.41.1
safetensors>=0.3.3
packaging
ninja
philschmid commented 4 months ago

Do you want to open a PR which updates the dataset version?

AnisZakari commented 4 months ago

With pleasure !