Closed slavaGanzin closed 4 years ago
This is tricky, but maybe this answer helps. Text is greedy, thus placing the more special rule (dare) in front is a good idea.
Then you can use the fact, that multiple assignments lead to a list (https://textx.github.io/textX/stable/grammar/#multiple-assignment-to-the-same-attribute) and use the following:
import textx
g=r'''
Model: (date=Date|oov=OOV)*;
Date: date=/\d{1,2}[\/\.]\d{1,2}[\/\.]\d{2}(\d{2})?/; OOV: /\S+/;
'''
mm=textx.metamodel_from_str(g)
m=mm.model_from_str(r'''
01.01.2020 out of vocabulary 02.02.2020
''')
print(m.date)
print(m.oov)
However, this will also allow an empty model or a model without oov...
A related issue is #119
However, this will also allow an empty model or a model without oov...
I'm parsing free grammar so it's expected to not have some parts
@goto40 Thanks for your help. Really appreciate it!
Hello.
I'm working on NER for free-grammar texts (news articles) and it works great, until out of vocabulary word occurs in target text. I understand that PEG parsing designed for strict languages and this is a bit of overstretch. But it is what it is
For example, having this meta-model:
parses:
01.01.2020 out of vocabulary 02.02.2020
as:
{'date': ['01.01.2020'], 'oov': ['out', 'of', 'vocabulary', '02.02.2020']}
while expected(by me) result would be:
{'date': ['01.01.2020', '02.02.2020'], 'oov': ['out', 'of', 'vocabulary']}
Because in my understanding parser should try specific rules first and than fallback to broader ones. And it clearly just repeats itself.I've read documentation and play a lot of time with skipws, ws, and eol. And had no acceptable result, I'm really desperate, so I'll ask you to postpone your rigorous mathematical mind and help with practical advice.