Closed tmaxmax closed 3 years ago
I'm definitely open to adding methods that return an io.Reader
. Which methods in particular do you have in mind? I don't remember any particular reason we don't return a Reader
.
I'm thinking of Parse
and Translate
, these return full-sized documents. Should I open a PR then, and discuss the changes there?
Go for it. :smiley: Please avoid making breaking changes. I'd prefer to add additional methods than change the signature of the existing ones.
Is there a reason why the Tika client always reads the whole response body in memory using
ioutil.ReadAll
and then copies it again incallString
? It seems unnecessary and it's very inefficient, especially when sending large documents to Tika for parsing.I've forked the repository and made some changes, the tests all pass. I'm not opening a PR yet to see why this wasn't done before, as it's not obvious to me why things work this way right now and I want to avoid breaking anything.