usc-isi-i2 / etk

Extraction Toolkit
https://etk.readthedocs.io/en/development/
MIT License
81 stars 48 forks source link

date extraction #411

Open yishairasowsky opened 4 years ago

yishairasowsky commented 4 years ago

Is there a way to write a rule based system to catch things like start/end dates from a contract text. Here are a few real examples. I am bolding the date entities which I want spacy to automatically detect. If you have other ideas different than spacy that is also OK!

  1. The initial term of this Lease shall be for a period of Five (5) years commencing on February 1, 2012, (the “Lease Commencement Date”) and expiring on January 31, 2017 (the “Initial Lease Term”).

  2. Term: One (1) year commencing January 1, 2007 ("Commencement Date") and ending December 31, 2007 ("Expiration Date").

  3. This Lease Agreement is entered into for term of 15 years, beginning January 1, 2014 and ending on December 31, 2028.

szeke commented 4 years ago

I think ETK has a rule extractor using Space that scans the text for dates. It is slow as SpaCy 2 became very slow. @GreatYYX please confirm and put a pointer to example code, if we have it.

yishairasowsky commented 4 years ago

can i share with you this link to my question on stackoverflow. i hope it clarifies a little bit more.

On Mon, Dec 16, 2019 at 4:48 AM Pedro Szekely notifications@github.com wrote:

I think ETK has a rule extractor using Space that scans the text for dates. It is slow as SpaCy 2 became very slow. @GreatYYX https://github.com/GreatYYX please confirm and put a pointer to example code, if we have it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/usc-isi-i2/etk/issues/411?email_source=notifications&email_token=AJN7CN7KTMTJWOCU3HZYPY3QY3UBFA5CNFSM4J3AEZ2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG5LGKY#issuecomment-565883691, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJN7CN3O7P747IPN3F2K22LQY3UBFANCNFSM4J3AEZ2A .

-- Yishai Rasowsky 054.848.2245 Visit my Shiurim https://torahdownloads.com/s-437-rabbi-yishai-rasowsky.html | Thesis https://www.amherst.edu/media/view/58703/original/jesse_thesis.pdf | Workplace https://www.smrtflow.com/ | Github https://github.com/yishairasowsky/info_about_your_location | Linked-In https://www.linkedin.com/in/yishai-rasowsky-a28189164/

szeke commented 4 years ago

Sure, please do.

yishairasowsky commented 4 years ago

thx! https://stackoverflow.com/questions/59344316/detect-dates-in-spacy

On Mon, Dec 16, 2019 at 5:00 PM Pedro Szekely notifications@github.com wrote:

Sure, please do.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/usc-isi-i2/etk/issues/411?email_source=notifications&email_token=AJN7CNZAFF2ZDXZZ4F4AC5LQY6JZFA5CNFSM4J3AEZ2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG67NVY#issuecomment-566097623, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJN7CN7XFN5GG4INJIVGM5LQY6JZFANCNFSM4J3AEZ2A .

-- Yishai Rasowsky 054.848.2245 Visit my Shiurim https://torahdownloads.com/s-437-rabbi-yishai-rasowsky.html | Thesis https://www.amherst.edu/media/view/58703/original/jesse_thesis.pdf | Workplace https://www.smrtflow.com/ | Github https://github.com/yishairasowsky/info_about_your_location | Linked-In https://www.linkedin.com/in/yishai-rasowsky-a28189164/

GreatYYX commented 4 years ago

Code of date extractor is here: https://github.com/usc-isi-i2/etk/blob/master/etk/extractors/date_extractor.py#L141

It also supports extracting dates from self-defined formats: https://github.com/usc-isi-i2/etk/blob/master/examples/date_extractor/date_example.py#L34