sina-al / pynlp

A pythonic wrapper for Stanford CoreNLP.
MIT License
107 stars 11 forks source link

SUTime functionality #16

Open jqiao2 opened 6 years ago

jqiao2 commented 6 years ago

According to Stanford's website, SUTime is provided automatically in corenlp. Is it included in this wrapper as well? If so, is there any documentation or can anyone provide an example as to how to use it (specifically to go from tagged entities to storing/printing a TIMEX3 object)?

sina-al commented 6 years ago

This is not currently supported in the wrapper, but I could add it. Will keep you updated on this thread.

sina-al commented 6 years ago

Hey,

Install pynlp==0.4.2

pip3 install pynlp --upgrade

To get all the timex objects:

doc = nlp(text)
timexs = {entity.timex for entity in doc.entities if entity.timex}

For representations

print(timexs)

For text and other attributes tid, value, type, text

for timex in timex:
    print(timex, timex.tid) # for example

Note that i used python set() to get all the timexs. This is because CoreNLP works in such a way "Saturday" and "afternoon" in "Saturday afternoon" both give the same timex.

Any questions or bugs, please let me know. Will update the docs soon.

jqiao2 commented 6 years ago

Edit: Original comment down below. I solved that problem; I just didn't start the server properly with java.se.ee modules

Right now, some timex objects don't have anything stored in value, specifically relative times such as today, the last decade, or 14,050 years ago. Here is an excerpt of timex.text followed by timex.value for some entities (blank indicating no sutime equivalent, or at least that is how I interpret it):

24 March :  XXXX-03-24
5 October :     XXXX-10-05
the last decade :   
1987 :  1987
1991 :  1991
1992 :  1992
1993 :  1993
1996 :  1996
1999 :  1999
1988 :  1988
1999 :  1999
recently :  PAST_REF
1995 :  1995
1997 :  1997
1999 :  1999
1999 :  1999
1999 :  1999
current :   PRESENT_REF
1997 :  1997
2000 :  2000
1999 :  1999
2000 :  2000
9230 to 10,400  years :     
now :   PRESENT_REF
1987 :  1987
1993 :  1993
1987 :  1987
present :   PRESENT_REF
today :     
Recently :  PAST_REF
1999 :  1999
today :     
present :   PRESENT_REF
1998 :  1998
1989 :  1989
Present :   PRESENT_REF
previously :    PAST_REF
present :   PRESENT_REF
13,000 years ago :  
6500 years ago :    
4000 years ago :    
previously :    PAST_REF
past :  PAST_REF
recently :  PAST_REF
past :  PAST_REF
present :   PRESENT_REF
past :  PAST_REF
winter :    XXXX-WI
summer :    XXXX-SU
1994 :  1994
2002 :  2002
2003 :  2003
6500 years ago :    
6500 to 5800 years ago :    
6500 to 5800 years ago :    
4000 years ago :    
June :  XXXX-06
4000 years ago :    
spring :    XXXX-SP
previously :    PAST_REF
previously :    PAST_REF
beginning in the 1990s :    199X
14,250 years ago :  
14,050 years ago :  
13,690 to 13,700 years ago :    
16,000 to 17,000 years ago :    
once :  PAST_REF
1994 :  1994
about 20,000 years ago :    
present :   PRESENT_REF

Is there somewhere I'm supposed to pass in a "current" time, or any way I'm to let coreNLP know what time to base it's relative time calculations off of? Thanks

Original comment

Thanks for adding this!

Sorry this has taken so long; I've been messing around with timex but all the timex values I'm getting are blank. Following your code above exactly and with the text "In 1985, Reagan was in office.", "1985" has a non-null timex. However, when I print the entity.timex on "1985", I get:

{<Timex: [tid: , value: , type: ]>}

I'm thinking it has to do with the fact I set ner.useSUTime to False, but I can't run my code without getting a pynlp.exceptions.CoreNLPServerError error without it set to True. Is there an alternative to this, or is it another issue?

sina-al commented 6 years ago

Could you show me the same excerpt but with repr(timex) instead of timex.text? Thanks

jqiao2 commented 6 years ago

Here is the repr(timex) (I removed some of the timexs because they were just years.):

<Timex: [tid: t1, value: XXXX-03-24, type: DATE]>
<Timex: [tid: t2, value: XXXX-10-05, type: DATE]>
<Timex: [tid: t13, value: , type: DATE]>
<Timex: [tid: t14, value: 1987, type: DATE]>
<Timex: [tid: t15, value: 1991, type: DATE]>
<Timex: [tid: t16, value: 1992, type: DATE]>
<Timex: [tid: t17, value: 1993, type: DATE]>
<Timex: [tid: t18, value: 1996, type: DATE]>
<Timex: [tid: t19, value: 1999, type: DATE]>
<Timex: [tid: t21, value: 1988, type: DATE]>
<Timex: [tid: t22, value: 1999, type: DATE]>
<Timex: [tid: t23, value: PAST_REF, type: DATE]>
<Timex: [tid: t24, value: 1995, type: DATE]>
<Timex: [tid: t25, value: 1997, type: DATE]>
<Timex: [tid: t26, value: 1999, type: DATE]>
<Timex: [tid: t27, value: 1999, type: DATE]>
<Timex: [tid: t28, value: 1999, type: DATE]>
<Timex: [tid: t29, value: PRESENT_REF, type: DATE]>
<Timex: [tid: t30, value: 1997, type: DATE]>
<Timex: [tid: t31, value: 2000, type: DATE]>
<Timex: [tid: t32, value: 1999, type: DATE]>
<Timex: [tid: t33, value: 2000, type: DATE]>
<Timex: [tid: t34, value: , type: DURATION]>
<Timex: [tid: t64, value: PRESENT_REF, type: DATE]>
<Timex: [tid: t65, value: 1987, type: DATE]>
<Timex: [tid: t66, value: 1993, type: DATE]>
<Timex: [tid: t69, value: 1987, type: DATE]>
<Timex: [tid: t71, value: PRESENT_REF, type: DATE]>
<Timex: [tid: t40, value: , type: DATE]>
<Timex: [tid: t72, value: PAST_REF, type: DATE]>
<Timex: [tid: t73, value: 1999, type: DATE]>
<Timex: [tid: t40, value: , type: DATE]>
<Timex: [tid: t74, value: PRESENT_REF, type: DATE]>
<Timex: [tid: t75, value: 1998, type: DATE]>
<Timex: [tid: t76, value: 1989, type: DATE]>
<Timex: [tid: t77, value: PRESENT_REF, type: DATE]>
<Timex: [tid: t78, value: PAST_REF, type: DATE]>
<Timex: [tid: t79, value: PRESENT_REF, type: DATE]>
<Timex: [tid: t1, value: , type: DATE]>
<Timex: [tid: , value: , type: ]>
<Timex: [tid: t2, value: , type: DATE]>
<Timex: [tid: t3, value: PAST_REF, type: DATE]>
<Timex: [tid: t4, value: PAST_REF, type: DATE]>
<Timex: [tid: t6, value: PAST_REF, type: DATE]>
<Timex: [tid: t8, value: PAST_REF, type: DATE]>
<Timex: [tid: t9, value: PRESENT_REF, type: DATE]>
<Timex: [tid: t11, value: PAST_REF, type: DATE]>
<Timex: [tid: t12, value: XXXX-WI, type: DATE]>
<Timex: [tid: t13, value: XXXX-SU, type: DATE]>
<Timex: [tid: t14, value: 1994, type: DATE]>
<Timex: [tid: t15, value: 2002, type: DATE]>
<Timex: [tid: t16, value: 2003, type: DATE]>
<Timex: [tid: t17, value: , type: DATE]>
<Timex: [tid: t18, value: , type: DATE]>
<Timex: [tid: t18, value: , type: DATE]>
<Timex: [tid: t19, value: , type: DATE]>
<Timex: [tid: t21, value: XXXX-06, type: DATE]>
<Timex: [tid: t23, value: , type: DATE]>
<Timex: [tid: t24, value: XXXX-SP, type: DATE]>
<Timex: [tid: t25, value: PAST_REF, type: DATE]>
<Timex: [tid: t26, value: PAST_REF, type: DATE]>
<Timex: [tid: t27, value: 199X, type: DATE]>
<Timex: [tid: t23, value: , type: DATE]>
<Timex: [tid: t24, value: , type: DATE]>
<Timex: [tid: t25, value: , type: DATE]>
<Timex: [tid: t26, value: , type: DATE]>
<Timex: [tid: t27, value: PAST_REF, type: DATE]>
<Timex: [tid: t29, value: 1994, type: DATE]>
<Timex: [tid: t30, value: , type: DATE]>
<Timex: [tid: t32, value: PRESENT_REF, type: DATE]>

I went through some of your source code here on GitHub and found an altvalue variable. When I print timex._timex, it shows up:

altValue: "PREV_IMMEDIATE 2500"
text: "the last 2500"
type: "DATE"
tid: "t23"

compared to a year timex:

value: "1300"
text: "1300"
type: "DATE"
tid: "t4"

However, running print(timex.altValue) in the try/except block returns nothing like it's not there.

Furthermore, running print(entity.normalized_ner) returns the altValue when it exists and value when that exists. I've tried passing a time for ner to use using ner.providedDocDate and also setting ner.usePresentDateForDocDate to true but neither converted the altValues to an absolute date. Should coreNLP convert the offsets to an absolute date, or should I be doing that myself?