When I try to embed with input_length>512, I get the following error in the NREM docker logs:
level": "ERROR", "file_path": "/usr/local/lib/python3.10/dist-packages/nemollm_embedding/service/service.py", "line_number": 225, "time": "2024-04-11 22:38:03,012012", "message": "Got error from Triton: in ensemble 'NV-Embed-QA_ensemble', Failed to process the request(s) for model instance 'NV-Embed-QA_tokenizer_0_3', message: RequestValidationError: Input length 548 exceeds maximum allowed token size 512
It might be a better approach to allow truncation of input chunks, and give a warning to the user, instead of throwing errors.
Thank you!
When I try to embed with input_length>512, I get the following error in the NREM docker logs:
level": "ERROR", "file_path": "/usr/local/lib/python3.10/dist-packages/nemollm_embedding/service/service.py", "line_number": 225, "time": "2024-04-11 22:38:03,012012", "message": "Got error from Triton: in ensemble 'NV-Embed-QA_ensemble', Failed to process the request(s) for model instance 'NV-Embed-QA_tokenizer_0_3', message: RequestValidationError: Input length 548 exceeds maximum allowed token size 512
It might be a better approach to allow truncation of input chunks, and give a warning to the user, instead of throwing errors. Thank you!