manastech / middleman-search

LunrJS-based search for Middleman
MIT License
58 stars 31 forks source link

dynamic pages not being indexed? #15

Closed timcreatewell closed 8 years ago

timcreatewell commented 8 years ago

Hi there,

First up - thanks so much for putting this excellent extension together and making it available. I've been able to get it plugged into my Middleman (v4) website so far without much fuss at all!

One small thing I have noticed is that my dynamic pages (https://middlemanapp.com/advanced/dynamic_pages/) don't seem to be picked up for indexing? I'm using dynamic pages for my blog (e.g. /blog/a-post-slug.html) and it would be great if I was able to search across these pages also.

I have 'blog/' defined within the search.resources property (e.g. search.resources = ['blog/', 'index.html']) and within the config.rb I've tried moving the activate :search do |search| block below the block where the dynamic pages are defined, but this doesn't seem to help.

Is there something I'm missing that I need to do in my config.rb file to get this working?

Many thanks!

matiasgarciaisaia commented 8 years ago

Hi @timcreatewell - nice to hear the extension helps you :)

We're not currently using the extension anymore for our website (but we should get it back working soon), but in our previous version (I owe you a link - this archived version doesn't actually work) we did, and it effectively indexed our staff pages - done with proxy + a template.

We had code like this in our config.rb:

data.staff.each do |identifier, person|
  person = data.staff[identifier]
  proxy "/staff/#{identifier}/index.html", "/staff/template.html", locals: { identifier: identifier, person: person, staff: staff }, ignore: true, layout: 'staff', title: person[:name]
end

# ...

activate :search do |search|

  # Resources to index
  search.resources =  ['blog/', 'contactus/', 'everythingelse/', 'projects/', 'staff/', 'tools/', 'workwithus/']

  # ...
end

And this is the generated index - download it to search for the /staff/ URLs - they're there.

I can't tell you if there's something about Middleman 4 - we're still using 3.3 -, but middleman-search worked at least once with dynamic pages.

May it be something about the order in which they're declared in the config.rb?

If this is an open-source project we could take a look at, let me know so I can try anything.

timcreatewell commented 8 years ago

Hi @matiasgarciaisaia , thanks for getting in touch.

I've just tried putting the search block below the proxy declarations in the config.rb file and it doesn't seem to work :( Hmm.. this issue may be a Middleman v4 thing then?

Unfortunately my project isn't open source, but below is the relevant part of my config.rb:

if Dir.exist?(config.data_dir + "/website")
  data.website.blogPost.each do |id, post|
    proxy "/blog/#{post.slug}.html", '/blog/post.html', locals: { post: post }, :ignore => true
  end
end

# search
activate :search do |search|
  search.resources = ['blog/', 'index.html', 'about-us/']
  search.index_path = 'search/search.json' # defaults to `search.json`
  search.fields = {
    title:   {boost: 100, store: true, required: true},
    content: {boost: 50},
    description: {boost: 50, store: true},
    url:     {index: false, store: true},
    author:  {boost: 30}
  }
end

I'm happy to try and do some digging into this, would you perhaps have any ideas where a good place to start would be?

Thanks :)

timcreatewell commented 8 years ago

Hey @matiasgarciaisaia , ok.. I'm not entirely sure what's happening here but I've come back a day later and now searching is working across dynamic pages... I was sure it wasn't yesterday! I think moving the search block below is what did the trick..

Cheers!

matiasgarciaisaia commented 8 years ago

Nice to hear!

If you ever want to try and determine if moving the search block had any sort of effect, we'll be glad to know.

Awesome it worked, anyways! 👍

gerwitz commented 6 years ago

I had the same problem, here. Using logging I can see the proxies are all created before the search index, but nonetheless they were not being included.

I tried @timcreatewell's technique of leaving it alone for a day several dozen times with no effect.

Just now, I tried using @matiasgarciaisaia's code as a basis above, and found that including a title: in the proxy was the key. As long as it is there (and this the sitemap includes an options hash) then my proxy pages are indexed.