Question about Classical Japanese

kylepjohnson commented 8 years ago

Hello,

Thank you for your wonderful library.

I run an open source project, the Classical Language Toolkit, which helps researchers do NLP in ancient and classical languages.

One of our contributors found your software and is interested in porting some of it for our users.

But because I do not know Japanese, I am interested to learn whether jProcessing is suitable for old Japanese texts (say, up until the year AD 1600).

Thanks again for sharing your software with the world. Feel free to be in touch with me directly at kyle@kyle-p-johnson.com if you prefer!

kevincobain2000 commented 8 years ago

Basic algorithms can of course be used. As WSD is not applied there, (optionally can be applied)
- Similarity between two sentences
- Longest common string etc.
Morphological Analyzer
- jProcessing uses Cabocha, if your target is ancient Japanese text, then you should be able to separately train cabocha with your own training data. and call it as it is via this python lib.
Finding parallel example sentences from Edict
- There is no Japanese Sentiword Net.
- So I used English SentiwordNet and mapped wordnet ids, then prepared polarity scores for Japanese lexicon (and entries in edict).
- https://raw.githubusercontent.com/kevincobain2000/jProcessing/master/src/jNlp/data/JapaneseSentiWordNet.txt
- Base forms are used, so at least for most words you should be able to get enough hits in the dictionary.
- Not sure about low frequency ancient words.
- I found edict's data enough, when I evaluated baseline classifiers on Newspaper (mainichi shinbun) corpus.
Sentiment Classification via WSD on Japanese text.
- Senses in Japanese text are NOT disambiguated at this moment anyways.
- Just using the Japanese word's baseform and getting score from SentiwordNet is enough for a baseline classifier.

kylepjohnson commented 8 years ago

Thank you this is really helpful.

We cannot use the sentiment analysis work (though it does look interesting). Cabocha interesting, however do you know of any treebanks for Classical Japanese?

kevincobain2000 / jProcessing

Question about Classical Japanese #10