Not returning anything for out-of-vocabulary text while batch inference using Tf-IDF ONNX Vectorizer model

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

MIT License

14.15k stars 2.85k forks source link

Describe the issue

A sklearn tfidf vectorizer model is onnxified and while doing batch prediction it should return zero vectors for out-of-vocabulary texts , but its not returning anything for those out-of vocabulary texts and removing them altogether from the output.

Example :- Suppose the batch contains 10 documents. Documents at indices 3 , 5 and 8 are out-of-vocabulary texts. The tfidf onnx vectorizer model is returning 7 vectors in its output altogether removing the out of vocabulary texts from from the output.

To reproduce

Urgency

No response

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

None

Execution Provider

Other / Unknown

microsoft / onnxruntime