triton-inference-server / pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
https://triton-inference-server.github.io/pytriton/
Apache License 2.0
687 stars 45 forks source link

Update megatron example so that it would support latest changes in NeMo #33

Closed PeganovAnton closed 9 months ago

PeganovAnton commented 10 months ago

What does this PR do?

Modifications done only to Megatron multinode example.

  1. Adds end_strings parameter
  2. Increases default min_length to 20. It is done to avoid empty responses.
  3. Add extra dimension to tensor shape if an input is a list (specifically, for end_strings case)
  4. Add support for unpacked .nemo checkpoint
  5. Use NLPDDPStrategy instead of NLPDDPPlugin.
  6. Add support for passing TritonConfig into NeMo Megatron multinode example
  7. Add --model-name parameter to NeMo Megatron multinode example
  8. Add --workspace parameter to NeMo Megatron multinode example
  9. Add --model-path for loading local checkpoints instead of HuggingFace checkpoints
github-actions[bot] commented 9 months ago

This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.

pziecina-nv commented 9 months ago

Hi Anton, I'm closing this PR as your changes are now on main branch. Thanks for taking look at this example and contribution.