huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
193 stars 58 forks source link

tests/test_examples.py failures with neuron 2.15 #366

Open dtsengAmazon opened 9 months ago

dtsengAmazon commented 9 months ago

I ran RUN_SLOW=true COVERAGE=high RUN_TINY=true USE_VENV=false pytest tests/test_examples.py -v on neuron 2.15 and ON 2.15 and ran into 2 errors.

  1. llama and mistral models both hang. I left both tests running for >2 hours and it was stuck. In total the rest of the test suite took <2 hours so this should not have happened. Previously in https://github.com/huggingface/optimum-neuron/issues/272 I had llama model without tp working but even before that it was stuck similar to this so it is most likely some kind of regression.
  2. Many tp tests are still failing as described in the ON issue linked above. Specifically, the results look like ths FAILED tests/test_examples.py::TextClassificationExampleTester::test_run_glue_bert_with_tp - assert 1 == 0 FAILED tests/test_examples.py::TokenClassificationExampleTester::test_run_ner_bert_with_tp - assert 1 == 0 FAILED tests/test_examples.py::MultipleChoiceExampleTester::test_run_swag_bert_with_tp - assert 1 == 0 FAILED tests/test_examples.py::QuestionAnsweringExampleTester::test_run_qa_bert_with_tp - assert 1 == 0 FAILED tests/test_examples.py::SummarizationExampleTester::test_run_summarization_t5_with_tp - assert 1 == 0 FAILED tests/test_examples.py::TranslationExampleTester::test_run_translation_t5_with_tp - assert 1 == 0 FAILED tests/test_examples.py::ImageClassificationExampleTester::test_run_image_classification_vit - assert 1 == 0`` With the only TP test passed being the first one (neo gpt). This result shows 7 failed 8 passed excluding the 4 tests commented out (lllama and mistral with and without tp each)
dacorvo commented 8 months ago

@michaelbenayoun can you take a look ?

HuggingFaceDocBuilderDev commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!