yzhangcs / parser

:rocket: State-of-the-art parsers for natural language.
https://parser.yzhang.site/
MIT License
837 stars 143 forks source link

How to get [('ROOT', 0, 2) phrase for Chinese? #101

Closed chenbogithub closed 2 years ago

chenbogithub commented 2 years ago

hello, thank you for your hard work! I read some usages, and try to get output like [(0, 5, 'TOP'), (0, 5, 'S'), (0, 1, 'NP'), (1, 4, 'VP'), (2, 4, 'S'), (2, 4, 'VP'), (3, 4, 'NP')]. but i failed. my code: par = Parser.load('biaffine-dep-electra-zh') tree = par.predict("我看猫和老鼠。")[0].trees supar.utils.Tree.factorize(tree)

However, I am told "AttributeError ". Looking forward to your answer, thank you very much!

yzhangcs commented 2 years ago

@chenbogithub Hi, to get constituency trees, you need to load a CRF constituency parser rather than a dependency parser, i.e., Parser.load('crf-con-zh').

chenbogithub commented 2 years ago

constituency trees

@chenbogithub Hi, to get constituency trees, you need to load a CRF constituency parser rather than a dependency parser, i.e., Parser.load('crf-con-zh').

Thank you very much for your fast and accurate answer

@chenbogithub Hi, to get constituency trees, you need to load a CRF constituency parser rather than a dependency parser, i.e., Parser.load('crf-con-zh').

Thank you very much for your fast and accurate answer! Besides, I want to get dependency parsing and and made some attempts. my code: par = Parser.load('biaffine-dep-electra-zh') dataset = par.predict('我看猫和老鼠。', lang='zh', prob=True, verbose=False) print(f"arcs: {dataset.arcs[0]}\n" f"rels: {dataset.rels[0]}\n" f"probs: {dataset.probs[0].gather(1,torch.tensor(dataset.arcs[0]).unsqueeze(1)).squeeze(-1)}")

the output: arcs: [2, 0, 5, 5, 2, 2] rels: ['nsubj', 'root', 'conj', 'cc', 'dobj', 'punct'] probs: tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000])

what`s the meaning of "arcs: [2, 0, 5, 5, 2, 2]"? thank you!

yzhangcs commented 2 years ago

@chenbogithub Hi, to do dep parsing on Chinese, you first need to tokenize the sentence into words first.

what`s the meaning of "arcs: [2, 0, 5, 5, 2, 2]"?

This list contains the output heads you wish to get. For example, the 1st number 0 means the 1st word takes the 0th word as its head.