kjvarga / sitemap_generator

SitemapGenerator is a framework-agnostic XML Sitemap generator written in Ruby with automatic Rails integration. It supports Video, News, Image, Mobile, PageMap and Alternate Links sitemap extensions and includes Rake tasks for managing your sitemaps, as well as many other great features.
MIT License
2.44k stars 275 forks source link

S3 adapter: upload with "content-type" and "content-encoding" headers #338

Open HoneyryderChuck opened 4 years ago

HoneyryderChuck commented 4 years ago

If I use the S3 adapter provided by the gem, after uploading the file ("sitemap.xml.gz"), when I provide the URL to a sitemap validator (like xml-sitemaps.com/validate-xml-sitemap.html ), there is an error, because the request returns an unexpected content media type: "application/gzip".

This happens because the raw S3 upload does not declare the media type, and this is inferred (either by S3 or fog). I've patched the adapter's write method locally so I could have access to the fog API that allows me to set these:

# lib/sitemap_generator/adapters/s3_adapter.rb#L30
  def write(location, raw_data)
      SitemapGenerator::FileAdapter.new.write(location, raw_data)

      directory.files.create(
        :key    => location.path_in_public,
        :body   => File.open(location.path),
        :public => true,
        "Content-Type" => "application/xml",
        "Content-Encoding" => "gzip"
      )
    end

After uploading with this patch, the sitemap link could be validated.

I'd like to propose either passing headers as a supported option, or to apply these changes to the adapter, depending of whether compression is on or off (for "content-encoding").

marckohlbrugge commented 5 months ago

I'm running into this same issue with Cloudflare R2 which uses the S3 adapter. My sitemap files have the application/x-gzip content-type which seems to confuse Google.

Would you mind making a PR of your suggested change? I think more people would benefit from it.