spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.31k stars 645 forks source link

Update main clause determination #1072

Closed ryancasburn-KAI closed 6 months ago

ryancasburn-KAI commented 6 months ago

Fixes #1071

I wasn't sure if you would want this pull request on master or dev. Let me know if you want me to change it.

This has two main updates:

  1. Change subordinating conjunctions to only match at the beginning of the clause.

"He collaborated with the university to develop comprehensive academic proposals throughout the entire campus, encompassing academic priorities in every department at the university and course structure typologies for each academic program." The word throughout in the first clause should not eliminate it from the main clause determination.

  1. Add check for clauses starting with gerund.

"Taking diligent notes throughout the entire class, the students remained focused during the lecture." Both clauses have a verb, and nothing else to eliminate them. However, starting with a gerund should eliminate the first clause.

ryancasburn-KAI commented 6 months ago

I'm wondering if all the subordinate (and maybe relative) list should also be constrained to start of clause only?

spencermountain commented 6 months ago

beauty! Thank you Ryan, I've been hoping that this method would get some love, at some point. Please take the wheel, and make any changes you'd like. I'm all in!

Also, i love the idea of debugging via gpt. Such a fine use of a LLM, to test assumptions like that.

If mainClause ever starts working reliably well, the features we could add to .sentence() would be incredible. cheers

ryancasburn-KAI commented 6 months ago

Great, glad I can help! My main use case at the moment is identifying if a sentence is written in present tense, so identifying the correct verb is important. I will continue to explore how this can be made better 😃