torvalds-dev / llama_index

LlamaIndex (formerly GPT Index) is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
0 stars 0 forks source link

[Feature Request]: Remove PII masking feature #5

Open torvalds-dev opened 8 months ago

torvalds-dev commented 8 months ago

Feature Description

Remove PII masking feature

Reason

No response

Value of Feature

No response

torvalds-dev commented 8 months ago

torvalds.dev is analyzing the ticket

torvalds-dev commented 8 months ago

The issue is a feature request to remove the PII (Personally Identifiable Information) masking feature.

The PII masking feature is implemented in the llama_index/indices/postprocessor/pii.py file. This file contains two classes, PIINodePostprocessor and NERPIINodePostprocessor, which are responsible for masking PII in the text.

The PIINodePostprocessor class uses a local model to mask PII in the text. It replaces each PII with a tag and returns the text along with a mapping of the original PII to the masked PII in JSON format.

The NERPIINodePostprocessor class uses a Hugging Face transformers model to mask PII in the text. It also replaces each PII with a tag and returns the text along with a mapping of the original PII to the masked PII in JSON format.

The llama_index/query_engine/flare/__init__.py, llama_index/finetuning/embeddings/__init__.py, and llama_index/embeddings/huggingface.py files might also be relevant to this issue as they are part of the same project and might be using the PII masking feature.

To address this issue, you could:

  1. Remove the PIINodePostprocessor and NERPIINodePostprocessor classes from the llama_index/indices/postprocessor/pii.py file.
  2. Search for any usage of these classes in the project and remove or replace them as necessary.
  3. Test the project to ensure that it still works as expected without the PII masking feature.