sosedoff / xml-sitemap

Easy XML sitemap generation for Ruby/Rails/Merb/Sinatra applications
MIT License
57 stars 19 forks source link

Allow strings on :updated option #6

Closed danhealy closed 13 years ago

danhealy commented 13 years ago

Hi again,

I noticed that you are enforcing the use of a Time or Date object for the :updated (aka lastmod) field, so that you can do .utc.iso8601 on the resulting object.

I am using this gem with rails, with very large MySQL tables, trying to set lastmod from a time field. It turns out that ActiveRecord converting the string in the DB to a Time (and then back to a string inside this gem) is one of the slowest aspects of my code. Since my time is already a string, I want to be able to input a string directly into this option (perhaps after some manual ISO8601 reformatting).

Checking that string for iso8601 compliance via Time.parse would undo the optimization, so that is not really an option. Also, I am not sure if there are any other canonical options for iso8601 validation. I don't want to introduce a giant regex in this gem, for example.

How do you think we can approach this issue? If the answer is to simply allow strings without validation, I can commit those changes.

Thanks, --Dan

sosedoff commented 13 years ago

Hey,

I think the best solution in this situation would be adding a timestamp check option somewhere during map initialization.

Something like that:

map = XmlSitemap::Map.new('foo.com', :time_check => false)
map.add('hello', :updated => 'your time string')

or

map = XmlSitemap.map('foo.com')
map.add('hello', :check_time => false, :updated => 'hellooo')

This could actually break your sitemap, but we can make that an option. Ideas?

danhealy commented 13 years ago

I think having the option on the actual element makes more sense, but either way works for me.

If check_time is true, then how should we validate the string?

sosedoff commented 13 years ago

Should be pretty simple:

class Item
  DEFAULT_PRIORITY = 0.5

  attr_reader :target, :updated, :priority, :changefreq
  attr_reader :time_check

  def initialize(target, opts={})
    @target     = target.to_s.strip
    @updated    = opts[:updated]  || Time.now
    @priority   = opts[:priority] || DEFAULT_PRIORITY
    @changefreq = opts[:period]   || :weekly
    @time_check = opts[:time_check] == true

    if @time_check
      # allow only date or time object
      unless @updated.kind_of?(Time) || @updated.kind_of?(Date)
        raise ArgumentError, "Time or Date required for :updated!"
      end

      # use full time and date only!
      @updated = @updated.to_time if @updated.kind_of?(Date)
    end
  end

  def lastmod_value
    time_check ? @updated.utc.iso8601 : @updated.to_s
  end
end

Then in render method:

# Render XML
def render
  xml = Builder::XmlMarkup.new(:indent => 2)
  xml.instruct!(:xml, :version => '1.0', :encoding => 'UTF-8')
  xml.urlset(XmlSitemap::MAP_SCHEMA_OPTIONS) { |s|
    @items.each do |item|
      s.url do |u|
        u.loc        item.target
        u.lastmod    item.lastmod_value
        u.changefreq item.changefreq.to_s
        u.priority   item.priority.to_s
      end
    end
  }.to_s
end

Didnt test it, but you get the idea.

sosedoff commented 13 years ago

Also, if we're talking about string validation, we can do following:

Since you're feeding your own data into the gem, you should consider its valid and properly formatted :)