spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.4k stars 654 forks source link

more than one slash inserts "null" #745

Closed AurielleP closed 4 years ago

AurielleP commented 4 years ago
const doc = nlp(Ramazanoğlu Mah. Mahsus Sk. No:1 Pendik / İSTANBUL / TÜRKİYE);
doc.debug();

// results in:

"No:1 Pendik / İSTANBUL null/ TÜRKİYE"
'No:1'    -  Noun, Singular
'Pendik / İSTANBUL'  -  ProperNoun, Noun, Singular
'null/ TÜRKİYE'  -  Noun, Singular
spencermountain commented 4 years ago

are Sk and Mah i18n abbreviations?

AurielleP commented 4 years ago

no - they are just street address part abbrev

mah. - mahallesi (district)
mh. - mahallesi (district)
blv. - bulvarı (boulevard)
cad. - caddesi (road)
cd. - caddesi (road)
sk. - sokak (alley)
ap. - apartmanı (apartment)
kat - floor. Kat 1 is 2nd floor by American method

they were tagged fine - i just excluded them to reduce the example output to only the part that was a bug (the slash parsing)

spencermountain commented 4 years ago

hey, this appears fixed now, in 13.2.0

const doc = nlp(`Ramazanoğlu Mah. Mahsus Sk. No:1 Pendik / İSTANBUL / TÜRKİYE`)
console.log(doc.terms().out('array'))
[ 'Ramazanoğlu',
  'Mah.',
  'Mahsus',
  'Sk.',
  'No:1',
  'Pendik',
  '/',
  'İSTANBUL',
  '/',
  'TÜRKİYE' ]

maybe we should try to join lonsesome slashes in the future, but for now I think that's the desired behaviour let me know if i'm wrong cheers