yzhangcs / parser

:rocket: State-of-the-art parsers for natural language.
https://parser.yzhang.site/
MIT License
825 stars 138 forks source link

Calling predict() on tokenized list doesn't work as expected [vi-sdp-roberta-en] #110

Closed creatorrr closed 1 year ago

creatorrr commented 1 year ago

Current:

>>> parser = Parser.load("vi-sdp-roberta-en")
>>> input = "Sarah met John at the mall today"
>>> tokenized = ['Sarah', 'met', 'John', 'at', 'the', 'mall', 'today', '.']
>>>
>>> result = parser.predict(tokenized, lang="en", prob=True, verbose=False)
>>> result
Dataset(n_sentences=8, n_batches=1, n_buckets=1)
>>> result[0]
1       Sarah   _       _       _       _       _       _       0:root  _

Expected:

>>> result = parser.predict(input, lang="en", prob=True, verbose=False)
>>> result
Dataset(n_sentences=1, n_batches=1, n_buckets=1)
>>> result[0]
1       Sarah   _       _       _       _       _       _       2:ARG1  _
2       met     _       _       _       _       _       _       0:root|4:ARG1|7:loc     _
3       John    _       _       _       _       _       _       2:ARG2  _
4       at      _       _       _       _       _       _       _       _
5       the     _       _       _       _       _       _       _       _
6       mall    _       _       _       _       _       _       4:ARG2|5:BV     _
7       today   _       _       _       _       _       _       _       _
yzhangcs commented 1 year ago

Hi, you don’t need to pass the arg lang if the inputs have been tokenized.

Yu Zhang Soochow University


From: Diwank Singh Tomer @.> Sent: Tuesday, September 13, 2022 12:44:20 AM To: yzhangcs/parser @.> Cc: Subscribed @.***> Subject: [yzhangcs/parser] Calling predict() on tokenized list doesn't work as expected [vi-sdp-roberta-en] (Issue #110)

Current:

parser = Parser.load("vi-sdp-roberta-en") input = "Sarah met John at the mall today" tokenized = ['Sarah', 'met', 'John', 'at', 'the', 'mall', 'today', '.']

result = parser.predict(tokenized, lang="en", prob=True, verbose=False) result Dataset(n_sentences=8, n_batches=1, nbuckets=1) result[0] 1 Sarah 0:root

Expected:

result = parser.predict(input, lang="en", prob=True, verbose=False) result Dataset(n_sentences=1, n_batches=1, nbuckets=1) result[0] 1 Sarah 2:ARG1 2 met 0:root|4:ARG1|7:loc 3 John 2:ARG2 4 at 5 the 6 mall 4:ARG2|5:BV 7 today _

— Reply to this email directly, view it on GitHubhttps://github.com/yzhangcs/parser/issues/110, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEMMYK6U4OALZHWVGZMU243V55MWJANCNFSM6AAAAAAQKUWLN4. You are receiving this because you are subscribed to this thread.Message ID: @.***>

creatorrr commented 1 year ago

Aha! Thank you. Makes sense. Closing this issue for now