hpcaitech / ColossalAI-Examples

Examples of training models with hybrid parallelism using ColossalAI
Apache License 2.0
333 stars 102 forks source link

BERT Data Preprocessing #172

Open JizeZhangCS opened 1 year ago

JizeZhangCS commented 1 year ago

🐛 Describe the bug

NVIDIA DeepLearningExamples removed LDDL from DLE tools on Aug 16, 2022. Therefore, the guide on https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/bert/preprocessing fails to work in the following aspects:

  1. pip install git+https://github.com/NVIDIA/DeepLearningExamples.git#subdirectory=Tools/lddl won't work. The solution could be either using the new url, i.e. pip install git+https://github.com/NVIDIA/lddl.git, or finding lddl from the history version https://github.com/NVIDIA/DeepLearningExamples/tree/29f5b7ab059025e4ead512e54037eddbdf740f19.
  2. after installing lddl, using pip install boto3 would lead to a version conflict, which is of unknown effect on the whole process.
  3. in the preprocessing part, both phase 1 and phase 2 wouldn't work. The details would be provided later.
  4. changing lddl source from the new url to the history version wouldn't solve the problem 3, not installing boto3 also wouldn't help.

Environment

python=3.8 pytorch=1.12.1 cudatoolkit=10.2.89 cuda=10.2

JizeZhangCS commented 1 year ago

More details about problem 2: install lddl first, then install boto3: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. awscli 1.22.55 requires botocore==1.24.0, but you have botocore 1.27.61 which is incompatible. awscli 1.22.55 requires s3transfer<0.6.0,>=0.5.0, but you have s3transfer 0.6.0 which is incompatible.

after install boto3, reinstall lddl: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. boto3 1.24.61 requires botocore<1.28.0,>=1.27.61, but you have botocore 1.24.0 which is incompatible. boto3 1.24.61 requires s3transfer<0.7.0,>=0.6.0, but you have s3transfer 0.5.2 which is incompatible.

JizeZhangCS commented 1 year ago

More details about problem 3: output.txt

FrankLeeeee commented 1 year ago

You can try pip install with the --no-dependencies flag to ignore the dependency conflict.