Closed lfzhagn closed 5 years ago
For the neural Pipeline
, unfortunately this is not an option yet.
For the CoreNLP client, if you are able to specify this dictionary file through CoreNLP properties files, you should be able to do the same with the client.
@qipeng Now I use a python dict properties
to specify parameters used in CoreNLP client.
My code is like this,
`#code in pyhton
properties={
...
"tokenize.language": "zh",
"segment.model": "edu/stanford/nlp/models/segmenter/chinese/ctb.gz",
"segment.sighanCorporaDict": "edu/stanford/nlp/models/segmenter/chinese",
"segment.serDictionary": "edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz",
"segment.sighanPostProcessing": "true",
...
}
with CoreNLPClient(annotators=['ner'], timeout=90000, memory='16G',properties=properties) as client:
annotated = client.annotate(text)
...
`
My custom dictionaries are like
chinese_medicine_name.txt
chinese_person_name.txt
It seems that the segment.serDictionary
only allow one value to follow. Can I specify my dictionaries using python code?
You suggest me to modify the properties files. my understanding is to unpack the stanford-chinese-corenlp-2018-10-05-models.jar
, add my custom dict path in the StanfordCoreNLP-chinese.properties
, and then re-compress them?
Very grateful for you reply :)
@J38 probably knows more about setting custom dictionaries.
But to use your own, it should probably suffice to just pack your own dictionary files with some identifiable path into a jar file and add that to the classpath used to run the CoreNLP server, and load the dictionary files similarly.
Thanks a lot! Your explanation is quite clear :)
This is my solution:
Thank you very much! That really helps a lot.👍
Li,Yang notifications@github.com 于2019年12月19日周四 下午2:26写道:
This is my solution:
- Download stanford-segmenter-2018-10-16.zip https://nlp.stanford.edu/software/segmenter.shtml from the official site.
- Unzip it to get the ChineseDictionary tool.(edu.stanford.nlp.wordseg.ChineseDictionary)
- Add custom dictionary files (eg. places.txt) with one place name as a line and no more than 6 words.
- Extend the existing dictionary located in folder "data/dict-chris6.ser.gz" by command: java edu.stanford.nlp.wordseg.ChineseDictionary -inputDicts data/dict-chris6.ser.gz,places.txt -output dict-chris6.ser.gz
- Replace edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz with this file.
- jar cvf stanford-chinese-corenlp-2018-10-05-models.jar *
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanfordnlp/issues/73?email_source=notifications&email_token=ALHAH57JKYFSWG5A3HFWA5DQZMHZ3A5CNFSM4HG474R2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHISDWY#issuecomment-567353819, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALHAH5Y3ZWGR4TFHJKX62HDQZMHZ3ANCNFSM4HG474RQ .
Hi! I have some questions about using custom dictionary in
stanfordnlp
.pipline
?stanfordnlp
provides a Python wrapper for the Java Stanford CoreNLP Server, which can help me extract some name entities. Can I add my custom dictionary when I use theCoreNLPClient
? I know this can be done in Java code using Stanford CoreNLP, I want to know if I can achieve this in Python usingstanfordnlp
?Here, when refer custom dictionary, I mean something like this. `#my_dictionary.txt
Thanks a lot if you can give me some answers. :)