omc / searchyll

GNU General Public License v3.0
48 stars 15 forks source link

Work with `pages`? #25

Open BA88 opened 7 years ago

BA88 commented 7 years ago

I am trying to use searchyll to add ElasticSearch (ES) capabilities to my Git-Page website. My Git-Pages site is made up of "pages" not "posts". So I wonder if that's the root of the issue? (Also, I use collections in my _config.yml file.)

I've gotten as far as trying to add documents to my ES database via jekyll build. I don't see that my pages are added.

Below I have included details of what I've done, but an overview is:

  1. Updated my _config.yml: a. Added searchyll gem. b. Added elasticsearch

  2. Updated my `_layouts/page.html' to include

    <article>...</article>
  3. Run elasticsearch locally (for now)

  4. Run jekyll build a. I can see the indexing document puts output. b. I added some additional puts to searchyll.rb just in case. All seems okay.

  5. In my elasticsearch, I do not see any new messages a. I expected a message as each document is indexed into the ES database but nope

  6. GET _search returns nothing a. Not surprising

  7. To test my ES: a. I manually PUT b. I saw a message in my elasticsearch output c. I manually GET

My environment:

Details:

#-------
$ cat _config.yml
[ snip ]
# stuff BA added
gems: [
  jekyll-paginate, jekyll-feed, rouge, searchyll
]

elasticsearch:
  url: http://localhost:9200
  index_name: CSG-Wiki
  default_type: "page"          # Optional. Default type is "post".

collections:
  general:
    title: General
    output: true
    permalink: /:collection/:path/:title.html

#-------
$ cat _layouts/page.html
---
layout: default
---

<div class="page">
  <h1 class="page-title">{{ page.title }}</h1>
    <!-- this will be sent to elasticsearch, along with full page metadata -->
    <article class="page-content">
      {{ content }}
    </article>
</div>

#-------
$ elasticsearch --verbose
[2017-09-18T08:04:30,005][INFO ][o.e.n.Node               ] [] initializing ...
[2017-09-18T08:04:30,078][INFO ][o.e.e.NodeEnvironment    ] [7_61xZT] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [126.1gb], net total_space [232.6gb], spins? [unknown], types [hfs]
[2017-09-18T08:04:30,078][INFO ][o.e.e.NodeEnvironment    ] [7_61xZT] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-09-18T08:04:30,090][INFO ][o.e.n.Node               ] node name [7_61xZT] derived from node ID [7_61xZTTSr6bdGqad_FYTQ]; set [node.name] to override
[2017-09-18T08:04:30,090][INFO ][o.e.n.Node               ] version[5.6.0], pid[42125], build[781a835/2017-09-07T03:09:58.087Z], OS[Mac OS X/10.12.6/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_131/25.131-b11]
[2017-09-18T08:04:30,090][INFO ][o.e.n.Node               ] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/usr/local/Cellar/elasticsearch/5.6.0/libexec]
[2017-09-18T08:04:30,706][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [aggs-matrix-stats]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [ingest-common]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-expression]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-groovy]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-mustache]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [lang-painless]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [parent-join]
[2017-09-18T08:04:30,707][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [percolator]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [reindex]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [transport-netty3]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] loaded module [transport-netty4]
[2017-09-18T08:04:30,708][INFO ][o.e.p.PluginsService     ] [7_61xZT] no plugins loaded
[2017-09-18T08:04:31,815][INFO ][o.e.d.DiscoveryModule    ] [7_61xZT] using discovery type [zen]
[2017-09-18T08:04:32,191][INFO ][o.e.n.Node               ] initialized
[2017-09-18T08:04:32,192][INFO ][o.e.n.Node               ] [7_61xZT] starting ...
[2017-09-18T08:04:32,358][INFO ][o.e.t.TransportService   ] [7_61xZT] publish_address {127.0.0.1:9300}, bound_addresses {[fe80::1]:9300}, {[::1]:9300}, {127.0.0.1:9300}
[2017-09-18T08:04:35,401][INFO ][o.e.c.s.ClusterService   ] [7_61xZT] new_master {7_61xZT}{7_61xZTTSr6bdGqad_FYTQ}{5kt0gbCuQZ2ZDrjkx6cImg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-09-18T08:04:35,419][INFO ][o.e.h.n.Netty4HttpServerTransport] [7_61xZT] publish_address {127.0.0.1:9200}, bound_addresses {[fe80::1]:9200}, {[::1]:9200}, {127.0.0.1:9200}
[2017-09-18T08:04:35,419][INFO ][o.e.n.Node               ] [7_61xZT] started
[2017-09-18T08:04:35,529][INFO ][o.e.g.GatewayService     ] [7_61xZT] recovered [1] indices into cluster_state
[2017-09-18T08:04:35,675][INFO ][o.e.c.r.a.AllocationService] [7_61xZT] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[library][1]] ...]).

#-------
$ jekyll build
WARN: Unresolved specs during Gem::Specification.reset:
      rb-fsevent (>= 0.9.4, ~> 0.9)
      rb-inotify (>= 0.9.7, ~> 0.9)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
Configuration file: /Users/bfo7328/Documents/hca/project/wiki/_config.yml
            Source: /Users/bfo7328/Documents/hca/project/wiki
       Destination: /Users/bfo7328/Documents/hca/project/wiki/_site
Incremental build: disabled. Enable with --incremental
      Generating...
        indexing document /general/AE_job_desc.html
        indexing document /general/index.html
        indexing document /general/setup_elasticsearch.html
        indexing document /general/setup_phone_cisco_unity.html
        [ snip ]
        indexing document /unix/setup_linux_analytics_server.html
        indexing page /404.html
        indexing page /atom.xml
        indexing page /
        indexing page /feed.xml
       Old indices:
                    done in 5.697 seconds.
Auto-regeneration: disabled. Use --watch to enable.

#-------
$ elasticsearch --verbose
[ no new messages ]

#-------
$ curl -XGET localhost:9200/_search?pretty
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

#-------
$ curl -X PUT 'localhost:9200/library/books/1?pretty' -H 'Content-Type: application/json' -d'
{
  "title" : "A fly on the wall",
  "name"  : {
    "first": "Drosophila",
    "last" : "Melanogaster"
  },
  "publish_date" : "2015-06-21T23:39:40-0400",
  "price"        : 19.95
}
'

# output:
{
  "_index" : "library",
  "_type" : "books",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

#-------
$ elasticsearch --verbose
[ new messages: ]
[2017-09-18T08:59:10,176][INFO ][o.e.c.m.MetaDataCreateIndexService] [7_61xZT] [library] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-09-18T08:59:10,252][INFO ][o.e.c.m.MetaDataMappingService] [7_61xZT] [library/qqN2Ig5uSQO7HBI94tp6fQ] create_mapping [books]

#-------
$ curl -XGET localhost:9200/_search?pretty
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "books",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "A fly on the wall",
          "name" : {
            "first" : "Drosophila",
            "last" : "Melanogaster"
          },
          "publish_date" : "2015-06-21T23:39:40-0400",
          "price" : 19.95
        }
      }
    ]
  }
}
robsears commented 7 years ago

It's possible that Searchyll isn't (yet) compatible with Elasticsearch 5.x. I would be interested to see a traffic capture between Jekyll and Elasticsearch. You could use socat for that:

  1. brew install socat
  2. Set url: http://localhost:9400 in _config.yml
  3. Run sudo socat -v TCP-LISTEN:9400,fork TCP:localhost:9200 &> data.log
  4. In another terminal window, re-run jekyll build
  5. Inspect the data.log file: cat data.log

In this case, socat will bind to port 9400 and it will take whatever traffic is sent to this port and pass it along to localhost:9200, then relay the response. As far as Jekyll knows, it's talking directly to Elasticsearch, but socat is a middleman logging everything that passes through it to data.log. There may be some interesting information in there that explains what's happening.

BA88 commented 7 years ago

I'll give this a try! Thanks for providing clear step-by-step instructions and an explanation.

Soon,

On Mon, Sep 18, 2017 at 5:46 PM, Rob notifications@github.com wrote:

It's possible that Searchyll isn't (yet) compatible with Elasticsearch 5.x. I would be interested to see a traffic capture between Jekyll and Elasticsearch. You could use socat for that:

  1. brew install socat
  2. Set url: http://localhost:9400 in _config.yml
  3. Run sudo socat -v TCP-LISTEN:9400,fork TCP:localhost:9200 &> data.log
  4. In another terminal window, re-run jekyll build
  5. Inspect the data.log file: cat data.log

In this case, socat will bind to port 9400 and it will take whatever traffic is sent to this port and pass it along to localhost:9200, then relay the response. As far as Jekyll knows, it's talking directly to Elasticsearch, but socat is a middleman logging everything that passes through it to data.log. There may be some interesting information in there that explains what's happening.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/omc/searchyll/issues/25#issuecomment-330377389, or mute the thread https://github.com/notifications/unsubscribe-auth/AL-Mbp8w1MzMG9xEmKnWDypBIBFC4E3Yks5sjvLGgaJpZM4PbmRi .

BA88 commented 7 years ago

I am attaching two (2) files here.

File data.txt is from running the five steps outlined above.

File data2.txt is from running a successful PUT. I ran these curl commands:

curl -XGET localhost:9400/_search?pretty
curl -XGET 'http://localhost:9400/_cat/health?v'
curl -X PUT 'localhost:9400/library/books/1?pretty' -H 'Content-Type: application/json' -d'
{
  "title" : "A fly on the wall",
  "name"  : {
    "first": "Drosophila",
    "last" : "Melanogaster"
  },
  "publish_date" : "2015-06-21T23:39:40-0400",
  "price"        : 19.95
}
'
curl -XGET localhost:9400/_search?pretty
curl -XDELETE localhost:9400/library
curl -XGET localhost:9400/_search?pretty

data.txt data2.txt

allizad commented 7 years ago

@BA88 can you paste those .txt files in a Github gist and share a link? Thanks!

BA88 commented 7 years ago

data.txt: https://gist.github.com/BA88/d7bd52688328c99c9c81f19089a547b4 data2.txt: https://gist.github.com/BA88/ece564bb9859c75c9761003dcee3303a

I've never created gists before. Please let me know if these links don't work. Or if you need something else.

Thanks!

BA88 commented 7 years ago

Hi all. Do you think I should downgrade Elasticsearch 5.x if that is the issue with searchyll? If so, is there a version of Elasticsearch that you recommend?

I've looked around the searchyll code a bit. I don't know enough to determine where the problem is.

I have a professional goal of adding search ability to our GitHub Pages wiki. If there is something I can / should do, I'll do it! I hope to help however I can.

Thank you!

robsears commented 7 years ago

Hey there,

We just pushed a change to authentication settings. Not sure if that's the root cause, but I do see auth exceptions in the logs. Want to pull the latest changes and test again?

BA88 commented 7 years ago

I incorporated the changes you made. I worked around a few errors until I got stuck again. I think I'm running different versions of things such that I encounter problems.

Problem 1 + Resolution: in lib/searchyll/indexer.rb: removed all [double quotes] from definition of update_aliases.body:

162,165c162,165
<         actions: [
<            { remove: { index: old_indices.join(','), alias: configuration.elasticsearch_index_base_name }},
<            { add:    { index: elasticsearch_index_name, alias: configuration.elasticsearch_index_base_name }}
<          ]
---
>         "actions": [
>           { "remove": { "index": old_indices.join(','), "alias": configuration.elasticsearch_index_base_name }},
>           { "add":    { "index": elasticsearch_index_name, "alias": configuration.elasticsearch_index_base_name }}
>         ]

Problem 2 + Resolution: undefined method present?: Fix -- added to lib/searchyll/indexer.rb: require 'active_support/all'

Error:

$ jekyll build --trace
WARN: Unresolved specs during Gem::Specification.reset:
      rb-fsevent (>= 0.9.4, ~> 0.9)
      rb-inotify (>= 0.9.7, ~> 0.9)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
Configuration file: /Users/bfo7328/Documents/hca/project/wiki/_config.yml
   1. begin searchyll.rb
            Source: /Users/bfo7328/Documents/hca/project/wiki
       Destination: /Users/bfo7328/Documents/hca/project/wiki/_site
 Incremental build: disabled. Enable with --incremental
      Generating... 
     2. begin Jekyll::Hooks.register :site, :pre_render
/Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll/indexer.rb:104:in `http_request': undefined method `present?' for nil:NilClass (NoMethodError)
            from /Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll/indexer.rb:83:in `http_put'
            from /Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll/indexer.rb:52:in `prepare_index'
            from /Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll/indexer.rb:70:in `start'
            from /Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll.rb:17:in `block in <top (required)>'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/hooks.rb:98:in `call'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/hooks.rb:98:in `block in trigger'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/hooks.rb:97:in `each'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/hooks.rb:97:in `trigger'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/site.rb:188:in `render'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/site.rb:69:in `process'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/command.rb:26:in `process_site'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/commands/build.rb:63:in `build'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/commands/build.rb:34:in `process'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/lib/jekyll/commands/build.rb:16:in `block (2 levels) in init_with_program'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `call'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `block in execute'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `each'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `execute'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary/program.rb:42:in `go'
            from /Users/bfo7328/.gem/ruby/2.0.0/gems/mercenary-0.3.6/lib/mercenary.rb:19:in `program'
            from /Library/Ruby/Gems/2.0.0/gems/jekyll-3.4.5/exe/jekyll:13:in `<top (required)>'
            from /Users/bfo7328/bin/jekyll:23:in `load'
            from /Users/bfo7328/bin/jekyll:23:in `<main>'

Problem 3 -- no resolution: Error: indexer.rb:138: stack level too deep

NB: I added debug puts statements to lib/searchyll/indexer.rb to help me ...

$ jekyll build --trace
WARN: Unresolved specs during Gem::Specification.reset:
      rb-fsevent (>= 0.9.4, ~> 0.9)
      rb-inotify (>= 0.9.7, ~> 0.9)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
Configuration file: /Users/bfo7328/Documents/hca/project/wiki/_config.yml
   1. begin searchyll.rb
            Source: /Users/bfo7328/Documents/hca/project/wiki
       Destination: /Users/bfo7328/Documents/hca/project/wiki/_site
 Incremental build: disabled. Enable with --incremental
      Generating... 
     2. begin Jekyll::Hooks.register :site, :pre_render
     5. begin Jekyll::Hooks.register :documents, :post_render
        indexing document /general/index.html
          5a. document.id /general/index
     5. begin Jekyll::Hooks.register :documents, :post_render
        indexing document /general/setup_elasticsearch.html
          5a. document.id /general/setup_elasticsearch
     4. begin Jekyll::Hooks.register :pages, :post_render
        indexing page /404.html
          4a. page.name 404.html
          4b. page.url  /404.html
     4. begin Jekyll::Hooks.register :pages, :post_render
        indexing page /atom.xml
          4a. page.name atom.xml
          4b. page.url  /atom.xml
     4. begin Jekyll::Hooks.register :pages, :post_render
        indexing page /
          4a. page.name index.html
          4b. page.url  /
     4. begin Jekyll::Hooks.register :pages, :post_render
        indexing page /feed.xml
          4a. page.name feed.xml
          4b. page.url  /feed.xml
     3. begin Jekyll::Hooks.register :site, :post_render
/Library/Ruby/Gems/2.0.0/gems/searchyll-0.10.0/lib/searchyll/indexer.rb:138: stack level too deep (SystemStackError)

While the above is running, my elasticsearch process does not "see" any of these pages.

What additional information can I provide?