[FR] Entities Distinction for Token Classification Tasks

ghyadav commented 10 months ago

Willingness to contribute

Yes. I can contribute this feature independently.

Proposal Summary

Hi,

In the current flow for "token-classification" tasks, the output returned is not comprehendable?

For example, for a given NER task (and not POS tagging), and a given input sentence, the predict method returns a list of named entities:

In the above scenario, it is difficult to map the predicted entities to the original words/tokens in the input string. For input string "What is your name", the predict method returns ["I-Misc"] as output. There is no way to directly map this output to any token/word in the input string.

Can we please modify the results to be more comprehensive?

Thanks, Ghanshyam

Motivation

What is the use case for this feature?

It will help to map the predicted entities back to the original words

Why is this use case valuable to support for MLflow users in general?

Without adding this support, the predicted result might not be useful

Why is this use case valuable to support for your project(s) or organization?

Without adding this support, the predicted result might not be useful

Why is it currently difficult to achieve this use case?

There is no way to map the predicted entities back to the original word

Details

No response

What component(s) does this bug affect?

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/gateway: AI Gateway service, Gateway client APIs, third-party Gateway integrations
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[X] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
[ ] area/projects: MLproject format, project running backends
[X] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

What language(s) does this bug affect?

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

github-actions[bot] commented 10 months ago

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

Drij77 commented 5 months ago

I found similar bug is there any update on this.

mlflow / mlflow