Closed tanshiqi closed 5 years ago
I tried on Windows and fails, but works well on Linux. Can you tell me what is your operating system and version?
Anyway, I assume you are using the Tika server, not the command line JAR, right?
I use tika server from a docker container: LogicalSpark/docker-tikaserver, Version 1.2. It's runing on a ubuntu server.
Thanks, will try with Docker
I installed the docker container and all works well. I installed it on a Ubuntu Server 18.04 without problems. I requested also using cURL from command line as specified here:
curl -X PUT --data-binary @foo.txt http://localhost:9998/language/stream
Can you try using this command to discard a possible problem not related with this library?.
Anyway, the problem is reproducible on Windows, I'm working on it...
I'm sorry but I configured can't reproduce the problem again. I tried in multiple devices, using Windows, Linux and Docker as servers and none gives me this error.
Can you tell me more info, please?. What are you trying to do with the txt file? What's the size and encoding? Are you trying to use a remote document or a local one?
Thanks so much. I can reproduced the problem when the file encoding is GB2312 while UTF-8 is OK.
Example file: readme.txt
I made some tests and I think is an Apache Tika related bug. Using the file you uploaded, I always get the same error: javax.ws.rs.WebApplicationException: HTTP 415 Unsupported Media Type. I tried with the server on Windows, Linux and the Docker container you're using. You can try yourself with this command:
curl -T readme.txt http://localhost:9998/meta
But if I use other endpoints (like language) the library (and the server) returns the detected language.
The library only returns the error thrown by Apache Tika, so I think I can't do anything more than recommend you to open a bug in Apache page. If you think I'm wrong, I'm opened to hear other ideas...
Sorry for the delay. I can reproduce this problem, so working on it. Will release a new version ASAP with a bugfix.