openphacts / GLOBAL

Global project issues [private for now. owner lee harland]
3 stars 0 forks source link

Allow fuzzy matching in classification parameter #349

Open madgpap opened 8 years ago

madgpap commented 8 years ago

Calls like patents/byTarget allow filtering by patent classification code which is handy.

image

It would be much more useful if the parameter was matched in fuzzy manner, i.e. when I specify "C07D" as my input parameter, the system could do a classification LIKE "C07D%" match and retrieve patents with classification codes, such as "C07D 239/36".

antonisloizou commented 8 years ago

We don't currently have a mechanism to do substring searches directly on the RDF store. In addition, such operations are very expensive in SPARQL.

Typically the way we would handle this particular use case is to find/create a patent classification code ontology. For e.g. the ontology would state that "C07D 239/00" has the label "Heterocyclic compounds containing 1,3-diazine or hydrogenated 1,3-diazine rings" , and "C07D 239/36" is one of its sub classes. We could then have a Patent class members calls that retrieves patents classified with a user specified code , and any of its 'children'.

Looking at http://www.cooperativepatentclassification.org/cpc/scheme/C/scheme-C07D.pdf , there are multiple levels to this hierarchy , but the structure is not 100% clear to me.

Given the work required to achieve this, it will need to be postponed for a later release.

madgpap commented 8 years ago

So I've come up with a file that describes the pruned hierarchical entries (just the first 3 levels) of the IPC classification system. The hierachy is letter (A to H) --> 2-digit number (01 to 99) --> letter. E.g., for B01B:

B   PERFORMING OPERATIONS-TRANSPORTING
B01 PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
B01B    BOILING; BOILING APPARATUS

@antonisloizou, can we do anything with this in the semantic world? We could convert it to a proper ontology and map it to our patent documents, for example.

File attached: IPC_Antonis.txt

antonisloizou commented 8 years ago

Yes - we should make an ontology out of it, and then we can have the "members" methods

madgpap commented 8 years ago

@antonisloizou any estimate on the timeframe for that?