Phileas is a Java library to deidentify text and redact PII, PHI, and other sensitive information from text. Given text or documents (PDF), Phileas analyzes the text searching for sensitive information such as persons' names, ages, addresses, and many other types of information. Phileas is highly configurable through its settings and policies.
When sensitive information is identified, Phileas can manipulate the sensitive information in a variety of ways. The information can be replaced, encrypted, anonymized, and more. The user chooses how to manipulate each type of sensitive information. We refer to each of these methods in whole as "redaction."
Information can be redacted based on the content of the information and other attributes. For example, only certain persons' names, only zip codes meeting some qualification, or IP addresses that match a given pattern.
AI models for identifying PII and PHI in text are available at https://github.com/philterd/pii-models. These models can be used by both Phileas and Philter.
This list might be outdated. Please check the individual filter classes for details.
After cloning, run git lfs pull
to download models needed for unit tests. Phileas can then be built with mvn clean install
.
Phileas snapshots and releases are available in our Maven repositories so add the following to your Maven configuration:
<repositories>
<repository>
<id>philterd-repository-releases</id>
<url>https://artifacts.philterd.ai/releases</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>philterd-repository-snapshots</id>
<url>https://artifacts.philterd.ai/snapshots</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
Next, add the Phileas dependency to your project:
<dependency>
<groupId>ai.philterd</groupId>
<artifactId>phileas-core</artifactId>
<version>2.7.1</version>
</dependency>
Create a FilterService
, using a PhileasConfiguration
, and call filter()
on the service:
Properties properties = new Properties();
PhileasConfiguration phileasConfiguration = new PhileasConfiguration(properties);
FilterService filterService = new PhileasFilterService(phileasConfiguration);
FilterResponse response = filterService.filter(policies, context, documentId, body, MimeType.TEXT_PLAIN);
The policies
is a list of Policy
classes. (See below for more about Policies.) The context
and documentId
are arbitrary values you can use to uniquely identify the text being filtered. The body
is the text you are filtering. Lastly, we specify that the data is plain text.
The response
contains information about the identified sensitive information along with the filtered text.
The PhileasFilterServiceTest and EndToEndTests test classes have examples of how to configure Phileas and filter text.
Create a FilterService
, using a PhileasConfiguration
, and call filter()
on the service:
PhileasConfiguration phileasConfiguration = ConfigFactory.create(PhileasConfiguration.class);
FilterService filterService = new PhileasFilterService(phileasConfiguration);
BinaryDocumentFilterResponse response = filterService.filter(policies, context, documentId, body, MimeType.APPLICATION_PDF, MimeType.IMAGE_JPEG);
The policies
is a list of Policy
classes which are created by deserializing a policy from JSON. (See below for more about Policies.) The context
and documentId
are arbitrary values you can use to uniquely identify the text being filtered. The body
is the text you are filtering. Lastly, we specify that the data is plain text.
The response
contains a zip file of the images generated by redacting the PDF document.
A policy is an instance of a Policy
class that tells Phileas the types of sensitive information to identify, and what to do with the sensitive information when found. A policy describes the entire filtering process, from what filters to apply, terms to ignore, to everything in between. Phileas can apply one or more policies when filter()
is called. The list of policies will be applied in order as they were added to the list.
For examples on creating a policy, look at EndToEndTestsHelper. The PhileasFilterServiceTest and EndToEndTests test classes have examples of how to configure Phileas and filter text.
Policies can be de/serialized to JSON. Here is a basic (but valid) policy that identifies and redacts ages:
{
"name": "default",
"ignored": [],
"identifiers": {
"age": {
"ageFilterStrategies": [{
"strategy": "REDACT",
"redactionFormat": "{{{REDACTED-%t}}}"
}]
}
}
}
There is a long list of identifiers
that can be applied, and each identifier has several possible strategy
values. In this case, when a age is found, it is redacted by being replaced with the text {{{REDACTED-age}}}
. The %t
is a placeholder for the type of filter. In this case, it is the literal text age
.
Phileas is the underlying core of Philter, a turnkey text redaction engine which is built on top of Phileas and provides an API for redacting text. Philter runs entirely within your cloud and never transmits data outside of your cloud. Custom AI models are available for domains like healthcare, legal, and news. Philter is also open source.
Phileas also powers Airlock, an AI policy layer to prevent the disclosure of sensitive information, such as PII and PHI, in your AI applications.
As of Phileas 2.2.1, Phileas is licensed under the Apache License, version 2.0. Previous versions were under a proprietary license.
Copyright 2024 Philterd, LLC. Copyright 2018-2023 Mountain Fog, Inc.