rufuspollock-okfn / reconcile-csv

A simple OpenRefine reconciliation service that runs on top of a CSV file
BSD 2-Clause "Simplified" License
117 stars 28 forks source link

Malformed reply from SOCKS server #40

Open inigomurgui opened 3 years ago

inigomurgui commented 3 years ago

Hi there,

I'm starting to use Reconcile-CSV with OpenRefine and I'm trying to reconcile a column with names against a silly csv file with the following data:

"id";"persona" "6245";"Virgen María" "6246";"Ama Birjina" "6527";"Jesucristo" "6528";"Jesukristo" "23439";"Ruiz, Juan,Hitako Artzapeza" "23440";"Absurde"

Then I start the service like this:

C:\Users\IMURGUIO\Downloads\OpenRefine_ReconcileCSV>java -Xmx2g -jar reconcile-csv-0.1.2.jar personas.csv persona id Starting CSV Reconciliation service Point refine to http://localhost:8000 as reconciliation service 2020-10-07 11:38:38.130:INFO:oejs.Server:jetty-7.x.y-SNAPSHOT 2020-10-07 11:38:38.282:INFO:oejs.AbstractConnector:Started SelectChannelConnector@0.0.0.0:8000

But when I launch the service in OpenRefine it finally retrieves this error:

11:49:22.687 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {}->http://localhost:8000: Malformed reply from SOCKS server (11225ms) 11:49:22.688 [..mpl.execchain.RetryExec] Retrying request to {}->http://localhost:8000 (1ms) 11:49:35.046 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {}->http://localhost:8000: Malformed reply from SOCKS server (12358ms) 11:49:35.048 [..mpl.execchain.RetryExec] Retrying request to {}->http://localhost:8000 (2ms) 11:49:45.807 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {}->http://localhost:8000: Malformed reply from SOCKS server (10759ms) 11:49:45.807 [..mpl.execchain.RetryExec] Retrying request to {}->http://localhost:8000 (0ms) 11:49:57.362 [ command] Failed to guess cell types for load {"q1":{"query":"Robles, Gil","limit":3},"q2":{"query":"Gald¾s, PÚrez","limit":3},"q3":{"query":"Queipo de Llano","limit":3},"q4":{"query":"EcheverrÝa Novoa, JosÚ","limit":3},"q5":{"query":"Prieto, Indalecio","limit":3},"q6":{"query":"Ortega Gonzßlez, Antonio","limit":3},"q7":{"query":"Astigarrabia, Juan","limit":3},"q8":{"query":"Aldasoro","limit":3},"q9":{"query":"Aguirre, JosÚ Antonio","limit":3},"q0":{"query":"March, Juan","limit":3}} (11555ms) java.net.SocketException: Malformed reply from SOCKS server at java.net.SocksSocketImpl.readSocksReply(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:75) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) at com.google.refine.commands.recon.GuessTypesOfColumnCommand.guessTypes(GuessTypesOfColumnCommand.java:199) at com.google.refine.commands.recon.GuessTypesOfColumnCommand.doPost(GuessTypesOfColumnCommand.java:123) at com.google.refine.RefineServlet.service(RefineServlet.java:187) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166) at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81) at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:938) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 11:49:57.367 [ command] Exception caught (5ms) java.net.SocketException: Malformed reply from SOCKS server at java.net.SocksSocketImpl.readSocksReply(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:75) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) at com.google.refine.commands.recon.GuessTypesOfColumnCommand.guessTypes(GuessTypesOfColumnCommand.java:199) at com.google.refine.commands.recon.GuessTypesOfColumnCommand.doPost(GuessTypesOfColumnCommand.java:123) at com.google.refine.RefineServlet.service(RefineServlet.java:187) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166) at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81) at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:938) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

What is wrong in here?

Many thanks for your help!

inigomurgui commented 3 years ago

Other times I try it I receive the following errors:

In Reconcile-CSV console:

C:\Users\IMURGUIO\Downloads\OpenRefine_ReconcileCSV>java -Xmx2g -jar reconcile-csv-0.1.2.jar personas.csv persona id Starting CSV Reconciliation service Point refine to http://localhost:8000 as reconciliation service 2020-10-07 12:08:00.872:INFO:oejs.Server:jetty-7.x.y-SNAPSHOT 2020-10-07 12:08:01.018:INFO:oejs.AbstractConnector:Started SelectChannelConnector@0.0.0.0:8000 2020-10-07 12:08:52.722:WARN:oejs.AbstractHttpConnection:/reconcile java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask.report(Unknown Source) at java.util.concurrent.FutureTask.get(Unknown Source) at clojure.core$deref_future.invoke(core.clj:2108) at clojure.core$future_call$reify6267.deref(core.clj:6308) at clojure.core$deref.invoke(core.clj:2128) at clojure.core$pmap$step6280$fn6282.invoke(core.clj:6358) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$zipmap.invoke(core.clj:2713) at reconcile_csv.core$reconcile_params.invoke(core.clj:130) at reconcile_csv.core$reconcile.invoke(core.clj:140) at reconcile_csv.core$fn2664.invoke(core.clj:225) at compojure.core$make_route$fn534.invoke(core.clj:94) at compojure.core$if_route$fn__522.invoke(core.clj:40) at compojure.core$if_method$fn515.invoke(core.clj:25) at compojure.core$routing$fn540.invoke(core.clj:107) at clojure.core$some.invoke(core.clj:2443) at compojure.core$routing.doInvoke(core.clj:107) at clojure.lang.RestFn.applyTo(RestFn.java:139) at clojure.core$apply.invoke(core.clj:619) at compojure.core$routes$fn544.invoke(core.clj:112) at ring.middleware.keyword_params$wrap_keyword_params$fn1341.invoke(keyword_params.clj:32) at ring.middleware.nested_params$wrap_nested_params$fn1383.invoke(nested_params.clj:70) at ring.middleware.params$wrap_params$fn205.invoke(params.clj:58) at ring.adapter.jetty$proxy_handler$fn__81.invoke(jetty.clj:18) at ring.adapter.jetty.proxy$org.eclipse.jetty.server.handler.AbstractHandler$0.handle(Unknown Source) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:363) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:483) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:931) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:992) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NullPointerException at fuzzy_string.core$bigrams.invoke(core.clj:8) at clojure.lang.AFn.applyToHelper(AFn.java:161) at clojure.lang.AFn.applyTo(AFn.java:151) at clojure.core$apply.invoke(core.clj:617) at clojure.core$memoize$fn5049.doInvoke(core.clj:5735) at clojure.lang.RestFn.invoke(RestFn.java:408) at fuzzy_string.core$dice.invoke(core.clj:20) at reconcile_csv.core$score$fuzzy_match2621.invoke(core.clj:76) at clojure.core$map$fn4207.invoke(core.clj:2487) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core.protocols$seq_reduce.invoke(protocols.clj:26) at clojure.core.protocols$fn6026.invoke(protocols.clj:53) at clojure.core.protocols$fn5979$G59745992.invoke(protocols.clj:13) at clojure.core$reduce.invoke(core.clj:6175) at reconcile_csv.core$score.invoke(core.clj:78) at clojure.lang.AFn.applyToHelper(AFn.java:163) at clojure.lang.AFn.applyTo(AFn.java:151) at clojure.core$apply.invoke(core.clj:619) at clojure.core$partial$fn4190.doInvoke(core.clj:2396) at clojure.lang.RestFn.invoke(RestFn.java:408) at clojure.core$map$fn4207.invoke(core.clj:2487) at clojure.lang.LazySeq.sval(LazySeq.java:42) at clojure.lang.LazySeq.seq(LazySeq.java:60) at clojure.lang.RT.seq(RT.java:484) at clojure.core$seq.invoke(core.clj:133) at clojure.core$sort.invoke(core.clj:2752) at clojure.core$sort_by.invoke(core.clj:2769) at clojure.core$sort_by.invoke(core.clj:2767) at reconcile_csv.core$scores.invoke(core.clj:111) at reconcile_csv.core$reconcile_param.invoke(core.clj:123) at clojure.core$pmap$fn6275$fn6276.invoke(core.clj:6354) at clojure.core$binding_conveyor_fn$fn__4107.invoke(core.clj:1836) at clojure.lang.AFn.call(AFn.java:18) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

And in the OpenRefine console:

12:08:52.733 [ command] Failed to guess cell types for load {"q1":{"query":"Robles, Gil","limit":3},"q2":{"query":"Gald¾s, PÚrez","limit":3},"q3":{"query":"Queipo de Llano","limit":3},"q4":{"query":"EcheverrÝa Novoa, JosÚ","limit":3},"q5":{"query":"Prieto, Indalecio","limit":3},"q6":{"query":"Ortega Gonzßlez, Antonio","limit":3},"q7":{"query":"Astigarrabia, Juan","limit":3},"q8":{"query":"Aldasoro","limit":3},"q9":{"query":"Aguirre, JosÚ Antonio","limit":3},"q0":{"query":"March, Juan","limit":3}} (379ms) java.io.IOException: Failed - code:500 message: Server Error at com.google.refine.commands.recon.GuessTypesOfColumnCommand.guessTypes(GuessTypesOfColumnCommand.java:204) at com.google.refine.commands.recon.GuessTypesOfColumnCommand.doPost(GuessTypesOfColumnCommand.java:123) at com.google.refine.RefineServlet.service(RefineServlet.java:187) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166) at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81) at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:938) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 12:08:52.735 [ command] Exception caught (2ms) java.io.IOException: Failed - code:500 message: Server Error at com.google.refine.commands.recon.GuessTypesOfColumnCommand.guessTypes(GuessTypesOfColumnCommand.java:204) at com.google.refine.commands.recon.GuessTypesOfColumnCommand.doPost(GuessTypesOfColumnCommand.java:123) at com.google.refine.RefineServlet.service(RefineServlet.java:187) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166) at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81) at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:938) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

inigomurgui commented 3 years ago

I have tried it with both csv files (the one in Reconcile-CSV and the one in OpenRefine) in UTF8 without BOM format and with UNIX LF format for the Carrier returns and I always have the "java.net.SocketException: Malformed reply from SOCKS server" exception...

inigomurgui commented 3 years ago

I've solved it disabling all my proxy configuration (Windows configuration and IE configuration)