spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.44k stars 656 forks source link

Subject of a sentence. #388

Open sarahyj opened 7 years ago

sarahyj commented 7 years ago

Is it possible to detect the subject of the sentence using compromise? :)

videophonegeek commented 7 years ago

The subject of a sentence is a person, place, or thing. You can screen for those in compromise with .people().data() ,places().data() .nouns().data()

The subject could be a pronoun and that is harder to screen for. Find the main verb. Verbs preceded by "to" are infinitives. I've noticed compromise tends to label verbs freely as inifinitives when they are not. Watch for split infinitives. Inifinitives are not main verbs. Avoid potential subjects being part of the verb phrase of the main verb and find those part of a noun phrase.

sebilasse commented 7 years ago

@videophonegeek I would recommend to use .topic() which has also Organization. Couldn't it also be a Gerund as in "Driving makes sebi tired." or an Infinitive-Construction with a Gerund as in "Making everybody happy is not easy" ?

[native german speaker here, not sure if it's same in english ...]

videophonegeek commented 7 years ago

(groan) "Making everybody happy is not easy." It is correct English grammar and the grammar police will not come after you.

spencermountain commented 7 years ago

yeah @sebilasse good point. This is really something i want the library to start doing.

he was walking really fast -> GerundVerb

walking is really fun -> GerundNoun

I think there's a task for that somewhere, .. ah here.

...I have no idea how it would work, and would love some help.

IntegerMan commented 6 years ago

I'm somewhat doing this. I restrict my interpreter to imperative sentences only, and so I look at any noun to the left of the verb (and restrict to one verb only) and view it as the subject of a sentence on my end. I also infer "I" as the subject of most sentences without an explicit subject. My usage is fairly narrow and covers text-based games out of the 1980's and 1990's.

Here's a portion of my TypeScript code for working with subject identification. Note that Command is basically a wrapper object I have representing the sentence and CommandToken is a similar wrapper built around compromise terms.

  private identifySentenceNouns(command: Command, tokens: CommandToken[]): void {

    let indexOfVerb: number = -1;
    if (command.verb) {
      indexOfVerb = tokens.indexOf(command.verb);
    }

    // Grab the nouns and stick them into the sentence as the objects
    const nouns: CommandToken[] = tokens.filter(t => SentenceParserService.isNounLike(t));
    for (const noun of nouns) {

      // When no verbs are present and the first noun is a direction, interpret it as a 'Go' verb.
      if (!command.verb && noun.classification === TokenClassification.Direction) {
        command.verb = this.buildGoToken();
      }

      // If this noun comes before the verb, we're going to use it as a subject instead of as an object, but only for the first noun
      if (!command.subject && indexOfVerb > tokens.indexOf(noun)) {
        command.subject = noun;
      } else {
        command.objects.push(noun);
      }
    }
  }

My full project is available at https://gitlab.com/IntegerMan/angularIF although the tests aren't implemented yet and there's still a ton of documentation, etc. to be done, but if it's helpful, check it out.

https://gitlab.com/IntegerMan/angularIF/blob/master/src/app/engine/parser/sentence-parser.service.ts in particular may be interesting to you.

@spencermountain our conversation earlier encouraged me to open up the source for you to take a peek at if you're interested.

spencermountain commented 4 years ago

hey, there is now a .subjects() method in compromise-sentences. I'm not sure how well it is working, and would love some eyes on it.

WIll keep this open.

MilkyDeveloper commented 1 year ago

It looks like .subjects() is no longer included in compromise-sentences yet is still documented:

doc = nlp("Ecological rule that states that no two species can occupy the same exact niche in the same habitat at the same time.").sentences().subjects()
// Uncaught TypeError: nlp(...).sentences(...).subjects is not a function

The subject of a sentence is now calculated under the hood and exposed through .json():

doc = nlp("Ecological rule that states that no two species can occupy the same exact niche in the same habitat at the same time.").sentences().json()[0].sentence.subject
// 'ecological rule that'

I would never think that this would be possible without massive overhead. Kudos to the maintainers!

spencermountain commented 1 year ago

hey @MilkyDeveloper sorry about that. I'm not sure what happened to sentences().subject(). I recommend using verbs().subjects() like this: https://runkit.com/spencermountain/649460605ec8ad0009f65e68

The only difference is sentences().subject() did some pretty weak analysis to find the 'main' verb in the sentence, whatever that means. Depending on your context, you may want more control over which verbs you get the subject of.

I can look at bringing back the function, but it still needs a smarter solution. I will remove it from the readme, for now. Any help is welcome cheers