Open simongray opened 2 years ago
in case anybody else is trying the "sentiment" annotator, for instance:
(->> ((->pipeline {:annotators ["sentiment"]}) "Paula gave me 10 dollars. Of those $10 I used only one dollar. That felt bad. But also great.")
sentences
(map (comp :sentiment recur-datafy))
)
You can redefine recur-datafy like this (I left the debugging in case @simongray wants to try it out):
(in-ns 'dk.simongray.datalinguist)
(defmacro ignore-errors [& body]
`(try ~@body (catch Exception e#)))
(def my (atom nil))
(defn recur-datafy
"Return a recursively datafied representation of `x`.
Call at the end of an annotation chain to get plain Clojure data structures."
[x]
(let [x* (datafy x)]
;; (prn "WOW---" x*)
;; (reset! my x*)
(cond
(seq? x*)
(mapv recur-datafy x)
(set? x*)
(set (map recur-datafy x*))
(map? x*)
(ignore-errors (into {} (for [[k v] (dissoc x* :tree/binarized-tree :tree/tree) ;; (select-keys x*
;; '(:tree/tree !
;; :token-end
;; :semantic-graph/collapsed-cc-processed-dependencies
;; :token-begin
;; :semantic-graph/basic-dependencies
;; :sentence-index
;; :sentiment
;; :semantic-graph/collapsed-dependencies
;; :character-offset-begin
;; :semantic-graph/enhanced-plus-plus-dependencies
;; ; :tree/binarized-tree !
;; :semantic-graph/enhanced-dependencies :tokens :character-offset-end :text
;; ))
]
[(recur-datafy k) (recur-datafy v)])))
;; Catches nearly all Java collections, including custom CoreNLP ones.
(instance? Iterable x*)
(mapv recur-datafy x*)
:else x*)))
I discarded the :tree/binarized-tree :tree/tree
keys, which seem to cause an infinite recursion.
With the prn
I see
"WOW---" :tree/tree
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me)) (NP (CD 10) (NNS dollars))) (. .)))"]
"WOW---" #object[edu.stanford.nlp.trees.LabeledScoredTreeNode 0x7b20fb2c "(ROOT (S (NP (NNP Paula)) (VP (VBD gave) (NP (PRP me))
Which means that recurring on the :tree/tree keyword continue to produce the same result. @simongray you can reproduce the logging by removing the dissoc
(->> ((->pipeline {:annotators ["sentiment"]}) "Paula gave me 10 dollars. Of those $10 I used only one dollar. That felt bad. But also great.")
sentences
first
recur-datafy
)
Maybe it could be enough to return the string of the contents of :tree/tree and :tree/binarized-tree? If so, adding another instance? case in recur-datafy could do the job.
Thank you, @ag91. I must admit that I haven't been actively developing this wrapper for a while now, so these longstanding issues continue to persist.
Are you using it for a project? Or just dabbling?
Oh, I was just dabbling with NLP really and I thought to try CoreNLP with Clojure. I like your library, it is making my exploration super easy: thank you for sharing it!
It is fine to leave it if I am the only user: I just wanted to help other users and you, if you ever wanted to investigate this further ;) (I can also open a PR if you have time and wish to save yourself some work. I am also fine with my personal fix)
Seems like it is an infinite loop in the datafy-tsm implementation. Removing the
datafy
call from(assoc m k (datafy v))
and leaving justv
seems to solve it for the regulardatafy
. This is also how it should be, it shouldn't be recursive in the case ofdatafy
.In the case of
recur-datafy
I will need to look further into what's causing it. I guess some sort of memory of is needed to avoid this issue.