spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.44k stars 656 forks source link

Parsing of middle-names #328

Open allanyuill opened 7 years ago

allanyuill commented 7 years ago

Version 7.0.28

I am getting the following strange behaviour when parsing a three part name. I noticed a comment that you were looking at the name parsing and thought this may be a good example. The term length is also returning one. Very strange!

const txt = "Hi, my name is Marion Bruce Wallace and I have a complaint about the plumbing at 21, Dean Street, Kilmarnock";
var name=nlp(txt).people(0);
console.log('Term 0: '+name.term(0).out()) //Marion
console.log('Term 1: '+name.term(1).out()) //Bruce
console.log('Term 2: '+name.term(2).out()) //Wallace
console.log('First Name: '+name.list[0].firstName.out()) // Marion Wallace
console.log('Middle Name: '+name.list[0].middleName.out()) // Empty
console.log('Last Name: '+name.list[0].lastName.out()) // Bruce
console.log('Term() Length: '+name.term().length) //1
spencermountain commented 7 years ago

hey @allanyuill yeah thanks. This is weird and needs a bit of care. The firstname rules I used for v6 were america/euro-centric, and I tried stepping-back, but maybe went too far. you can see how simple it is right now

if it helps, it's easier to see the parsed name with nlp(txt).people(0).data() will keep this open till it gets fleshed-out cheers

fmacpro commented 6 years ago

Hi Guys! wanted to jump in on this discussion a bit. I've noticed issues here as well..

  1. 3 part names (first, middle, last) or names that have a hyphenated part e.g Jacob Rees-Mogg
  2. Case is lost in some Scottish names where it is important. For example MacDonald, Macdonald, McDonald. These are all considered valid variants of the same clan name but the casing is important in historical terms as it gives meaning to the family tree and its evolution over time.

A further thought i had which is more general is that it might be useful to allow someone using this module to pass in custom lists of names, places, etc to the various dictionaries so they can train the module without having to fork if that makes sense.

spencermountain commented 6 years ago

yah! you can pass-in a plugin right now. v12 is gonna support extending classes, to add new methods and things.

what should Jacob Rees-Mogg output?

fmacpro commented 6 years ago

not sure if i'm understanding the question so apologies in advance if I've misunderstood but currently it comes back as "jacob rees" whereas it should be Jacob Rees-Mogg

firstname: Jacob last name: Rees-Mogg

his full name is Jacob William Rees-Mogg so William would be his middle name