simongray / datalinguist

Stanford CoreNLP in idiomatic Clojure.
GNU General Public License v3.0
114 stars 5 forks source link

adapt CFRClassifier to dataset and `tech.ml` #6

Closed behrica closed 3 years ago

simongray commented 3 years ago

Thank you for making a PR.

I do have some comments:

I can perhaps help you make some of the changes (props conversion and any final touches on formatting), but you will need to work a little bit more on it yourself first, before I am ready to accept it. Hope that is okay with you. If that's too bothersome, just put it in your own repo instead :)

behrica commented 3 years ago

Thanks for review and suggestions.

I addressed all of them, except the Code formatting, as I don't use Cursive, but emacs.

I made the properties function in datalinguist.clj public for the moment and it does work. Maybe you want to move this to a different ns. It works for me.

simongray commented 3 years ago

I took me a while to check out the changes. I just got a kid (he'll be 2 months tomorrow) so a lot of my sparetime is spent on him at the moment.

I made a lot of code style changes to make the code better fit in with the rest of the repo. The only major functional change is an introduction of a define-model! function, since I noticed that your code didn't provide a way to redefine the behaviour of the model using the options map (... without reaching into the source code).

I hope you agree with the changes. If you're ok with it, I'm ready to merge.

behrica commented 3 years ago

It is as well foreseen to add specific metadata, which would point to: https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/crf/CRFClassifier.html and to a "user guide", if existing.

I have done this only for one model so far:

:smile.classification/gradient-tree-boost

https://github.com/scicloj/scicloj.ml.smile/blob/49a753c8ce6ae991b591a563f6a723c660ddb776/src/scicloj/ml/smile/classification.clj#L150

I would therefore propose to do the same here. The doc-string of the "train" method can maybe contain "pointers" to this as well

simongray commented 3 years ago

It is as well foreseen to add specific metadata, which would point to: https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/crf/CRFClassifier.html and to a "user guide", if existing.

I have done this only for one model so far:

:smile.classification/gradient-tree-boost

https://github.com/scicloj/scicloj.ml.smile/blob/49a753c8ce6ae991b591a563f6a723c660ddb776/src/scicloj/ml/smile/classification.clj#L150

I would therefore propose to do the same here. The doc-string of the "train" method can maybe contain "pointers" to this as well

Sure, this all sounds good to me. Feel free to do it in a follow-up PR once this has been merged.

behrica commented 3 years ago

Seems all good to me. Please revert the "define.model!" changes and leave the todo for adding the options. I will do this soon.

simongray commented 3 years ago

I added a todo at the top of the crf namespace. Not sure what to write so I just referenced this PR.

I have reverted the change as you requested, although I did keep my renaming of the model name itself. Aside from :standford-nlp/crf being misspelled, the name of the library being wrapped is in fact Stanford CoreNLP, while StanfordNLP is the old name for the Python library that is now called Stanza. So I will insist on it being called either :corenlp/crf or :stanford-corenlp/crf since I think that is more proper (and CoreNLP is also how I refer to it elsewhere).

If you're OK with how everything looks, I suggest you press the merge button.