opencog / relex

English Dependency Relationship Extractor
http://wiki.opencog.org/w/RelEx
Apache License 2.0
85 stars 68 forks source link

Feature/263 load relex resources #268

Closed stellarspot closed 6 years ago

stellarspot commented 6 years ago

file_properties.xml and en-sent.bin are now loaded from jar file. I have not found usage of the EnglishSD.bin.gz file. It seems that it can be loaded from outside relex module.

There are some questions. May be the order of loading resource should be: check the system property first, then the default dir and the jar at the last?

DocSplitterOpenNLP15Impl contains private variable englishModelFilename and two public methods setEnglishModelFilename(...)/setEnglishModelFilename(). It seems that the usage of the get/set methods are useless because the englishModelFilename field is only used during object construction and never after it. It seems it has sense to remove setEnglishModelFilename(...)/setEnglishModelFilename() methods if they are not used by other modules (in other case there will be binary incompatibility).

linas commented 6 years ago

I'm confused about the comments. The doc splitter needs to to have a model to know how to split docs. That's what EnglishSD is supposed to provide. So for example in English Mr. Smith is late. should not split on the period after Mr. In French, it would be different Mssr. Clochard est trop tard. If you don't load a model file, the sentence splitter won't work.

linas commented 6 years ago

Oh, I see. I think that EnglishSD.bin.gz is the old name for what is now being called en-sent.bin. Since I think I'm mostly sure that this is what it is, then everything seems OK, I guess, I'll merge.

linas commented 6 years ago

thanks for reviewing @vsbogd

stellarspot commented 6 years ago

I looked at the EnglishSD.bin.gz file usage and it has been removed from the DocSplitterOpenNLP14Impl.java by fix 5a5188ca03ed0f06efbf57af0f7141288038a7ef It seems it is not used in the project now. At least I removed it from the data directory and the built jar and the run

relex.RelationExtractor -n 4 -l -t -f -s 'Alice wrote a book about dinosaurs for the University of California in Berkeley.'

was without errors and warnings.

Does it have sense to remove the EnglishSD.bin.gz file usage completely from the repository (data directory and maven and pom resource copying)?

linas commented 6 years ago

Yes, I guess that EnglishSD.bin.gz can be removed.