spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.48k stars 653 forks source link

Location-Aware Named Entity Disambiguation System #1028

Closed MarketingPip closed 1 year ago

MarketingPip commented 1 year ago

Do not know of a current solution, but a Location-Aware Named Entity Disambiguation System should be implemented.

I know we do have switches such as Person|Place but that is not ideal.

Example: "Kobe is a city in Japan. And we are lucky enough to have Kobe Bryant a famous basketball player visit."

I suggest reading a paper here tho - it doesn't not hold a practical solution for Compromise.js. It might be able to kick up some ideas.

Tho it would be a very heavy function, we could possibly train a model of some sort with some phrases similar that a name that is also a place and try to match via similarity etc... (tho again - not ideal - unless we want to add LOTS of data to compromise).

Again - no current idea how to solve this on my end, but maybe someone else has a good idea!

MarketingPip commented 1 year ago

Not that this is ideal for the size of compromise. We should be building a neural network to detect things like these.

As I do love compromise - it's taking a one shot chance at checking if a place / person (both).

Which we should be tagging into ONE entity. Rather than use literally checking a huge list of data which is really not true NLP (at its finest). Just a method of brute forcing (which returns multiple entities / tags.. - which is highly un ideal with dealing with NER tasks.

MarketingPip commented 1 year ago

@spencermountain - I was reading a paper & they were using Native Bytes classier with labeled of hand written / pre determined part of speech. Like example "word #TAG" - then putting each tag beside the word it followed and comparing.

This technique could possibly be done to help improve part of speech tagger, (breaking down to clauses & comparing each clause) to classifier.

MarketingPip commented 1 year ago

Closing this - as Spencer seems to not be interested in this idea.