vdobrovolskii / wl-coref

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"
MIT License
103 stars 37 forks source link

How to modify to Chinese data set #45

Open cuikai-ai opened 8 months ago

cuikai-ai commented 8 months ago

Hello, because ontonotes contains Chinese dataset in addition to English dataset, but when I change to Chinese runtime it error the following message: development: 0% 0/172 [00:00<?, ?docs/s]Exception in thread "main" java.lang.IllegalArgumentException: No head rule defined for DNP using class edu.stanford.nlp.trees.SemanticHeadFinder in DNP-27 at edu.stanford.nlp.trees.AbstractCollinsHeadFinder.determineNonTrivialHead(AbstractCollinsHeadFinder.java:222) at edu.stanford.nlp.trees.SemanticHeadFinder.determineNonTrivialHead(SemanticHeadFinder.java:348) at edu.stanford.nlp.trees.AbstractCollinsHeadFinder.determineHead(AbstractCollinsHeadFinder.java:179) at edu.stanford.nlp.trees.TreeGraphNode.percolateHeads(TreeGraphNode.java:476) at edu.stanford.nlp.trees.TreeGraphNode.percolateHeads(TreeGraphNode.java:474) at edu.stanford.nlp.trees.TreeGraphNode.percolateHeads(TreeGraphNode.java:474) at edu.stanford.nlp.trees.TreeGraphNode.percolateHeads(TreeGraphNode.java:474) at edu.stanford.nlp.trees.GrammaticalStructure.(GrammaticalStructure.java:94) at edu.stanford.nlp.trees.EnglishGrammaticalStructure.(EnglishGrammaticalStructure.java:86) at edu.stanford.nlp.trees.EnglishGrammaticalStructure.(EnglishGrammaticalStructure.java:66) at edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams.getGrammaticalStructure(EnglishTreebankParserParams.java:2271) at edu.stanford.nlp.trees.GrammaticalStructure$TreeBankGrammaticalStructureWrapper$GsIterator.primeGs(GrammaticalStructure.java:1361) at edu.stanford.nlp.trees.GrammaticalStructure$TreeBankGrammaticalStructureWrapper$GsIterator.next(GrammaticalStructure.java:1386) at edu.stanford.nlp.trees.GrammaticalStructure$TreeBankGrammaticalStructureWrapper$GsIterator.next(GrammaticalStructure.java:1333) at edu.stanford.nlp.trees.GrammaticalStructure.main(GrammaticalStructure.java:1604) development: 0% 0/172 [00:00<?, ?docs/s] Traceback (most recent call last): File "F:\PycharmProject\wl-coref\convert_to_jsonlines.py", line 394, in convert_con_to_dep(args.tmp_dir, conll_filenames)
File "F:\PycharmProject\wl-coref\convert_to_jsonlines.py", line 196, in convert_con_to_dep subprocess.run(cmd, check=True, stdout=out) File "F:\anaconda3\envs\spanbert\lib\subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['java', '-cp', 'downloads/stanford-parser.jar', 'edu.stanford.nlp.trees.EnglishGrammaticalStructure', '-basic', '-keepPunct', '-conllx', '-treeFile', 'temp\data/conll-2012/v4\data\development\data\chinese\annotations\bc\cctv\00\cctv_0000.v4_gold_conll']' returned non-zero exit status 1.

I think this may be because stanford-parser .jar does not contain support for Chinese grammar analysis, But I didn't find the relevant jar. So can you give me some suggestions or help with some changes?

vdobrovolskii commented 8 months ago

Hi This might be relevant: https://github.com/vdobrovolskii/wl-coref/issues/20