seanstory / filesystem-workplace-search-source

A Custom API Source implementation for Elastic Workplace Search that generates documents from a filesystem
0 stars 0 forks source link

The following artifacts could not be resolved #1

Closed sguerreropert closed 4 years ago

sguerreropert commented 4 years ago

Hi, I'm trying to build the project: mvn clean install

I got: [WARNING] The POM for com.sstory.workplace.search:workplace-search-client:jar:7.8.0-SNAPSHOT is missing, no dependency information available [WARNING] The POM for com.sstory.workplace.search:workplace-search-sdk:jar:7.8.0-SNAPSHOT is missing, no dependency information available

[ERROR] Failed to execute goal on project filesystem-workplace-search-source-core: Could not resolve dependencies for project com.sstory.workplace.search.source.filesystem:filesystem-workplace-search-source-core:jar:0.1.0-SNAPSHOT: The following artifacts could not be resolved: com.sstory.workplace.search:workplace-search-client:jar:7.8.0-SNAPSHOT, com.sstory.workplace.search:workplace-search-sdk:jar:7.8.0-SNAPSHOT: Could not find artifact com.sstory.workplace.search:workplace-search-client:jar:7.8.0-SNAPSHOT -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :filesystem-workplace-search-source-core

I've tried to find workplace-search-client jar on the internet but I couldn't.

seanstory commented 4 years ago

Hey thanks for your interest! I'm maintaining this in my free time, and haven't gotten the client onto maven central yet. It's here: https://github.com/seanstory/workplace-search-java if you build it locally first, that should resolve your problem. I'll leave this issue open in the mean time, until I get the client into a maven repo.

sguerreropert commented 4 years ago

Thank you for you work! now it's working, but I've a problem. I'm getting this after "bin/sync filesystem":


2020-09-03 10:42:40,070 INFO  [main] c.s.workplace.search.sdk.run.Sync - Loaded configuration from: /home/prueba_rally/AAA/filesystem-workplace-search-source-0.1.0-SNAPSHOT/bin/../config/source.yml
Sep 03, 2020 10:42:41 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

Sep 03, 2020 10:42:41 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
2020-09-03 10:42:41,736 INFO  [main] c.s.workplace.search.sdk.run.Sync - Initialized FilesystemSource
2020-09-03 10:42:41,738 INFO  [main] c.s.w.s.s.f.sources.FilesystemSource - Processing 4 files found under /home/prueba_rally/PRUEBA_FILE matching pattern: *
2020-09-03 10:42:41,755 DEBUG [main] c.s.w.s.s.f.sources.FilesystemSource - Processing file: /home/prueba_rally/PRUEBA_FILE/aaa.txt
2020-09-03 10:42:41,886 INFO  [main] c.s.workplace.search.sdk.run.Sync - Preparing to bulk-create documents to Content Source 5f50e3838273e4541f1a5fc2 with FilesystemSource
2020-09-03 10:42:41,886 DEBUG [main] c.s.w.s.s.f.sources.FilesystemSource - Processing file: /home/prueba_rally/PRUEBA_FILE/bbb
2020-09-03 10:42:41,960 DEBUG [main] c.s.w.s.s.f.sources.FilesystemSource - Processing file: /home/prueba_rally/PRUEBA_FILE/ccc
2020-09-03 10:42:41,968 DEBUG [main] c.s.w.s.s.f.sources.FilesystemSource - Processing file: /home/prueba_rally/PRUEBA_FILE/ggg
2020-09-03 10:42:41,989 DEBUG [main] c.s.workplace.search.client.Client - Attempting POST sources/5f50e3838273e4541f1a5fc2/documents/bulk_create.json
Exception in thread "main" javax.ws.rs.ProcessingException: java.net.SocketException: Unexpected end of file from server
    at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:261)
    at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:296)
    at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$0(JerseyInvocation.java:609)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:205)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390)
    at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:608)
    at com.sstory.workplace.search.client.Client.request(Client.kt:78)
    at com.sstory.workplace.search.client.Client.post(Client.kt:33)
    at com.sstory.workplace.search.client.ContentSourceDocuments$DefaultImpls.asyncCreateOrUpdateDocuments(ContentSourceDocuments.kt:15)
    at com.sstory.workplace.search.client.ContentSourceDocuments$DefaultImpls.indexDocuments(ContentSourceDocuments.kt:7)
    at com.sstory.workplace.search.client.Client.indexDocuments(Client.kt:17)
    at com.sstory.workplace.search.sdk.run.Sync.main(Sync.java:69)
Caused by: java.net.SocketException: Unexpected end of file from server
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:851)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:848)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
    at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
    at org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:367)
    at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:259)
    ... 13 more
seanstory commented 4 years ago

@sguerreropert can you try again, and watch the Enterprise Search logs to see if there is an error on that end? This error implies that the server hung up unexpectedly.

sguerreropert commented 4 years ago

I'm not seeing something useful:


[2020-09-03T14:47:17.734+00:00][35698][2380][action_controller][INFO]: [23dfedae-84c8-48af-9bd3-3d5a23e1d8ab] Completed 200 OK in 120ms (Views: 1.2ms)
[2020-09-03T14:47:27.539+00:00][35698][2378][app-server][INFO]: [12fdd6c7-4ac9-48f5-a527-acefb9ceeed7] Started GET "/ws/sources/status" for 44.44.45.4 at 2020-09-03 14:47:27 +0000
[2020-09-03T14:47:27.614+00:00][35698][2378][action_controller][INFO]: [12fdd6c7-4ac9-48f5-a527-acefb9ceeed7] Processing by FritoPie::ContentSourcesController#status as JSON
[2020-09-03T14:47:27.615+00:00][35698][2378][action_controller][INFO]: [12fdd6c7-4ac9-48f5-a527-acefb9ceeed7]   Parameters: {"host"=>"192.168.1.140:3002", "protocol"=>"https", "context"=>:account}
[2020-09-03T14:47:27.983+00:00][35698][2378][action_controller][INFO]: [12fdd6c7-4ac9-48f5-a527-acefb9ceeed7] Completed 200 OK in 367ms (Views: 2.8ms)
[2020-09-03T14:47:37.603+00:00][35698][2380][app-server][INFO]: [488c9b50-491c-4865-aafa-fa31c2a902e5] Started GET "/ws/sources/status" for 44.44.45.4 at 2020-09-03 14:47:37 +0000
[2020-09-03T14:47:37.786+00:00][35698][2380][action_controller][INFO]: [488c9b50-491c-4865-aafa-fa31c2a902e5] Processing by FritoPie::ContentSourcesController#status as JSON
[2020-09-03T14:47:37.788+00:00][35698][2380][action_controller][INFO]: [488c9b50-491c-4865-aafa-fa31c2a902e5]   Parameters: {"host"=>"192.168.1.140:3002", "protocol"=>"https", "context"=>:account}
[2020-09-03T14:47:37.994+00:00][35698][2380][action_controller][INFO]: [488c9b50-491c-4865-aafa-fa31c2a902e5] Completed 200 OK in 192ms (Views: 3.4ms)
[2020-09-03T14:47:47.531+00:00][35698][2362][app-server][INFO]: [3d5cffae-c319-4b70-8cee-938dca55cd3c] Started GET "/ws/sources/status" for 44.44.45.4 at 2020-09-03 14:47:47 +0000
[2020-09-03T14:47:47.546+00:00][35698][2362][action_controller][INFO]: [3d5cffae-c319-4b70-8cee-938dca55cd3c] Processing by FritoPie::ContentSourcesController#status as JSON
[2020-09-03T14:47:47.546+00:00][35698][2362][action_controller][INFO]: [3d5cffae-c319-4b70-8cee-938dca55cd3c]   Parameters: {"host"=>"192.168.1.140:3002", "protocol"=>"https", "context"=>:account}
[2020-09-03T14:47:47.680+00:00][35698][2362][action_controller][INFO]: [3d5cffae-c319-4b70-8cee-938dca55cd3c] Completed 200 OK in 133ms (Views: 1.5ms)
seanstory commented 4 years ago

I would expect you to have a few logs like:

app-server.1 | [2020-09-03T21:55:41.271+00:00][62313][2500][app-server][INFO]: [8320218f-f777-467f-85a5-31be5da7dd90] Started POST "/api/ws/v1/sources/5f5164cc0fabf0834c8c6adb/documents/bulk_create.json" for 127.0.0.1 at 2020-09-03 21:55:41 +0000
app-server.1 | [2020-09-03T21:55:41.372+00:00][62313][2500][action_controller][INFO]: [8320218f-f777-467f-85a5-31be5da7dd90] Processing by Api::FritoPie::V1::DocumentsController#bulk_create as JSON
app-server.1 | [2020-09-03T21:55:41.372+00:00][62313][2500][action_controller][INFO]: [8320218f-f777-467f-85a5-31be5da7dd90]   Parameters: {"_json"=>"[FILTERED]", "content_source_key"=>"5f5164cc0fabf0834c8c6adb"}

from the same place that you're running bin/sync filesystem can you try running:

curl -X POST -v http://localhost:3002/api/ws/v1/sources/[KEY]/documents/bulk_create \
-H "Authorization: Bearer [AUTH_TOKEN]" \
-H "Content-Type: application/json" \
-d '[
  {
    "id" : 1234,
    "title" : "The Meaning of Time",
    "body" : "Not much. It is a made up thing.",
    "url" : "https://example.com/meaning/of/time",
    "created_at": "2019-06-01T12:00:00+00:00",
    "type": "list"
  }
]'

replacing [KEY] with your Content Source Key and [AUTH_TOKEN] with your access token, and making whatever edits to the URL so that it's pointing to your installation of Enterprise Search?

sguerreropert commented 4 years ago

Thank you, that works, I see the content on workplace search. I realized that the issue was that I have https, so I had to comment endpoint out and I changed http to https. Now I'm getting a certificate error, Is there a way to point a certificate?

seanstory commented 4 years ago

The client didn't previously support SSL. I just added https://github.com/seanstory/workplace-search-java/commit/d7b2ea6733f7a1a0b8eb3ca2ffec0fdfa6e1da51, but that's probably not going to cut it if you have a self-signed cert. Can you share your stack trace? I can put some more work into making the client configurable for custom certs. PRs are also welcome :)

sguerreropert commented 4 years ago

Thank you very much for your effort. What does PRs mean? And when you say stack trace you mean the excepcion, right?

bin/sync filesystem

2020-09-07 14:08:41,107 INFO  [main] c.s.workplace.search.sdk.run.Sync - Loaded configuration from: /home/prueba_rally/AAA/filesystem-workplace-search-source-0.1.0-SNAPSHOT/bin/../config/source.yml
Sep 07, 2020 2:08:42 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

Sep 07, 2020 2:08:42 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
2020-09-07 14:08:42,762 INFO  [main] c.s.workplace.search.sdk.run.Sync - Initialized FilesystemSource
2020-09-07 14:08:42,764 INFO  [main] c.s.w.s.s.f.sources.FilesystemSource - Processing 4 files found under /home/prueba_rally/PRUEBA_FILE matching pattern: *
2020-09-07 14:08:42,770 DEBUG [main] c.s.w.s.s.f.sources.FilesystemSource - Processing file: /home/prueba_rally/PRUEBA_FILE/aaa.txt
2020-09-07 14:08:42,910 INFO  [main] c.s.workplace.search.sdk.run.Sync - Preparing to bulk-create documents to Content Source 5f50e3838273e4541f1a5fc2 with FilesystemSource
2020-09-07 14:08:42,910 DEBUG [main] c.s.w.s.s.f.sources.FilesystemSource - Processing file: /home/prueba_rally/PRUEBA_FILE/bbb
2020-09-07 14:08:43,022 DEBUG [main] c.s.w.s.s.f.sources.FilesystemSource - Processing file: /home/prueba_rally/PRUEBA_FILE/ccc
2020-09-07 14:08:43,031 DEBUG [main] c.s.w.s.s.f.sources.FilesystemSource - Processing file: /home/prueba_rally/PRUEBA_FILE/ggg
2020-09-07 14:08:43,043 DEBUG [main] c.s.workplace.search.client.Client - Attempting POST sources/5f50e3838273e4541f1a5fc2/documents/bulk_create.json
Exception in thread "main" javax.ws.rs.ProcessingException: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No name matching elasticsearch found
    at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:261)
    at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:296)
    at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$0(JerseyInvocation.java:609)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:205)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390)
    at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:608)
    at com.sstory.workplace.search.client.Client.request(Client.kt:78)
    at com.sstory.workplace.search.client.Client.post(Client.kt:33)
    at com.sstory.workplace.search.client.ContentSourceDocuments$DefaultImpls.asyncCreateOrUpdateDocuments(ContentSourceDocuments.kt:15)
    at com.sstory.workplace.search.client.ContentSourceDocuments$DefaultImpls.indexDocuments(ContentSourceDocuments.kt:7)
    at com.sstory.workplace.search.client.Client.indexDocuments(Client.kt:17)
    at com.sstory.workplace.search.sdk.run.Sync.main(Sync.java:69)
Caused by: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No name matching elasticsearch found
    at sun.security.ssl.Alerts.getSSLException(Alerts.java:198)
    at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1967)
    at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:331)
    at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:325)
    at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1688)
    at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:226)
    at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1082)
    at sun.security.ssl.Handshaker.process_record(Handshaker.java:1010)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1079)
    at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1388)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1416)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1400)
    at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
    at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1340)
    at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1315)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:264)
    at org.glassfish.jersey.client.internal.HttpUrlConnector.lambda$_apply$0(HttpUrlConnector.java:359)
    at org.glassfish.jersey.message.internal.CommittingOutputStream.commitStream(CommittingOutputStream.java:171)
    at org.glassfish.jersey.message.internal.CommittingOutputStream.commitStream(CommittingOutputStream.java:165)
    at org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:199)
    at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:276)
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator._flushBuffer(UTF8JsonGenerator.java:2137)
    at com.fasterxml.jackson.core.json.UTF8JsonGenerator.flush(UTF8JsonGenerator.java:1150)
    at com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:923)
    at org.glassfish.jersey.jackson.internal.jackson.jaxrs.base.ProviderBase.writeTo(ProviderBase.java:647)
    at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:242)
    at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:227)
    at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
    at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1116)
    at org.glassfish.jersey.client.ClientRequest.doWriteEntity(ClientRequest.java:504)
    at org.glassfish.jersey.client.ClientRequest.writeEntity(ClientRequest.java:486)
    at org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:361)
    at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:259)
    ... 13 more
Caused by: java.security.cert.CertificateException: No name matching elasticsearch found
    at sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:237)
    at sun.security.util.HostnameChecker.match(HostnameChecker.java:97)
    at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:462)
    at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:442)
    at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:209)
    at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:132)
    at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1670)
    ... 42 more
seanstory commented 4 years ago

What does PRs mean?

"PR" stands for "Pull Request." It's the primary way for people to contribute to open source libraries that they do not own.

javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No name matching elasticsearch found

I'm maintaining this project "for fun" in my free time, so it'll probably be a while before I build in full support for creating and configuring your own custom TrustStore. For now, I've just pushed https://github.com/seanstory/workplace-search-java/commit/c66810ebf12ce01c0d754fe4fcd3ab419e300438, which will allow you to set security: insecure in your config/source.yml. This will behave similar to using curl -k, and should resolve your hostname issue.

Don't forget, to make use of that change, you'll need to rebuild the client libraries from https://github.com/seanstory/workplace-search-java locally, and then copy the resulting jars into your lib/ directory. Or you can also just rebuild https://github.com/seanstory/filesystem-workplace-search-source also, and start over from a new tarball.

I've tested this with a source.yml like:

access_token: dfde5a4f4d5c0b40e81822835a58e1b05225466fb9c53b45be174d59ab1f3638
content_source_key: 5f52e9c94993c88d1fa9e3b3
endpoint: https://localhost:3002/api/ws/v1
security: insecure
filesystem:
  - /Users/seanstory/Desktop/Engineer1/

And an enterprise-search.yml like:

ent_search.auth.source: standard
elasticsearch.username: elastic
elasticsearch.password: changeme
allow_es_settings_modification: true
secret_management.encryption_keys: [KEY]
ent_search.ssl.enabled: true
ent_search.ssl.certificate: /Users/seanstory/Desktop/Dev/ent-search/server.crt # self-signed cert built by following: https://devcenter.heroku.com/articles/ssl-certificate-self
ent_search.ssl.key: /Users/seanstory/Desktop/Dev/ent-search/server.key # the matching key

ent_search:
  listen_port: 3002
  external_url: https://localhost:3002

And was able to successfully sync documents.

sguerreropert commented 4 years ago

Thank you so much for your time, now everything is working perfectly, it's really useful, elastic should integrate this tool as default!