spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.42k stars 655 forks source link

past-participle conjugation improvements #568

Open moberemk opened 5 years ago

moberemk commented 5 years ago

So, I'm trying to figure out how to normalize user data to use as part of our interface. One example I'm having particular trouble with is one example, where the user would enter "take a shower" as their text.

Ideally, I want to convert this to a question ("Have you taken a shower?") but I'm not clear on how to convert "take" to its past participle "taken" in this context. Some practical examples:

> nlp('take a shower').sentences().prepend('have you').append('today').toPastTense().toQuestion().out
('normal')
'had you take a shower today?'

> nlp('take a shower').sentences().prepend('did you').append('today').toPastTense().toQuestion().out(
'normal')
'did you take a shower today?'

The second example, prefixing the sentence with "did you", works as expected, but the first example both fails to convert "take" to the right conjugation as well as incorrectly changes "had" to "have".

I can make this work, but looking at the published API I don't see how to convert "take" to the proper tense manually, and none of the functions I'd expect to do it for me automagically (toQuestion, toPastTense) are doing it correctly, at least, in the call order that I'm using.

(If this is out-of-scope for the library or if I'm using the API wrong, please let me know!)

spencermountain commented 5 years ago

hey mark, sounds like a neat project. yeah, this is cool, and should be doable with compromise.

yeah - toQuestion is a little nutty. I remember that really doing my head in. The good news is that we kind of conjugate participles already, it looks like we're missing the "take" exception. this is how i'd do it -

let examples = [
  'eat a cookie',
  'take a shower',
  'watch the movie',
  'smell a flower',
  'join a club',
  'beat an egg' //participle working
];

const toParticiple = function(str) {
  let doc = nlp(str);
  let verb = doc.verbs(0);
  let conjugations = verb.conjugate()[0];
  //use the Participle, or the PastTense
  let form = conjugations.Participle || conjugations.PastTense;
  doc = doc.replace(verb, form);
  return doc.text();
};

console.log(examples.map((toParticiple)));

(if you add "taken" to the data, then run npm run pack, the 'take' example will work)

yeah, this is a little weird. will keep this issue open, so that it will become better

moberemk commented 5 years ago

Oh that's awesome! I'll definitely play around with it, though I do worry about if it's only applicable for sentences with that specific "[verb] [article] [noun]" or "[verb] [...phrase] [ noun]" patterns. Might need to write a guard for this to only be called in those cases to make sure it works reliably.

Appreciate the quick help with this! Does the change in name suggest that this is behavior the library intends to improve though?

spencermountain commented 5 years ago

yeah, I was wondering last night why we don't have a toParticiple() method, like the other conjugations. I guess it's because so-few verbs require it. I'm also not sure why 'take' didn't have that form. Maybe I've forgotten something.

if you can figure-out a way to improve toQuestion, or find more missing particple-forms, please do a PR! That's right in this project's 'wheel-house'. cheers

moberemk commented 5 years ago

I actually might do that for "taken" since that's one gap that I'm running in to now. Is that kind of change just adding a new form in the conjugations file, around here?

spencermountain commented 5 years ago

yup!