To make the review of the intelligence database easier, we could implement an "acceptability" rating system for non-classified assets. This would allow the tool to prioritise the assets that will be presented to the user for manual inspection and classification.
The acceptability rating would be based on how similar a non-classified asset is to classified assets (e.g., there's a trend that registrant names contain the word "Coca Cola", so non-classified registrant names that contain the word "Coca Cola" MIGHT belong to the customer, e.g. THE COCA-COLA COMPANY, COCA COLA NETWORK REDIRECT, COCA-COLA BOTTLING COMPANY OF MINDEN, etc.)
Approaches may include clustering machine learning algorithms, or a simply bucket of words approach.
To make the review of the intelligence database easier, we could implement an "acceptability" rating system for non-classified assets. This would allow the tool to prioritise the assets that will be presented to the user for manual inspection and classification.
The acceptability rating would be based on how similar a non-classified asset is to classified assets (e.g., there's a trend that registrant names contain the word "Coca Cola", so non-classified registrant names that contain the word "Coca Cola" MIGHT belong to the customer, e.g.
THE COCA-COLA COMPANY
,COCA COLA NETWORK REDIRECT
,COCA-COLA BOTTLING COMPANY OF MINDEN
, etc.)Approaches may include clustering machine learning algorithms, or a simply bucket of words approach.
See https://www.analyticsvidhya.com/blog/2015/08/common-machine-learning-algorithms/ . Might be useful when you actually decide to do this.