Open MiroBabic opened 2 years ago
Have you tried accessing some urls that are in your index file from the public Internet?
What’s the url where your robots.txt is hosted or the url of the index?
On Sun, Nov 6, 2022 at 2:04 PM MiroBabic @.***> wrote:
I get couldn't fetch from gsc for sitemap files, but index is read fine. When I put my sitemap to any validator, it shows it is ok.
this is my sitemap.rb
SitemapGenerator::Sitemap.default_host = "https://www.xxxxx.xxxxx" SitemapGenerator::Sitemap.compress = false SitemapGenerator::Sitemap.create_index = true
SitemapGenerator::Sitemap.create(:max_sitemap_links=>45000) do
[:sk, :en].each do |locale| add root_path.to_s + locale.to_s add "/#{locale}" + list_cities_path add "/#{locale}" + invoices_path add "/#{locale}" + orders_path add "/#{locale}" + contracts_path add "/#{locale}" + contractors_path
City.find_each do |city| add "/#{locale}/city/#{city.slug_url}", :lastmod => city.updated_at end Contractor.find_each do |contractor| add "/#{locale}/contractors/#{contractor.id}", :lastmod => contractor.updated_at end Invoice.find_each do |invoice| add "/#{locale}/invoices/#{invoice.id}", :lastmod => invoice.updated_at end Order.find_each do |order| add "/#{locale}/orders/#{order.id}", :lastmod => order.updated_at end Contract.find_each do |contract| add "/#{locale}/contracts/#{contract.id}", :lastmod => contract.updated_at end
end
Put links creation logic here.
#
The root path '/' and sitemap index file are added automatically for you.
Links are added to the Sitemap in the order they are specified.
#
Usage: add(path, options={})
(default options are used if you don't specify)
#
Defaults: :priority => 0.5, :changefreq => 'weekly',
:lastmod => Time.now, :host => default_host
#
Examples:
#
Add '/articles'
#
add articles_path, :priority => 0.7, :changefreq => 'daily'
#
Add all articles:
#
Article.find_each do |article|
add article_path(article), :lastmod => article.updated_at
end
end
what can I do to make it readable for google ?
— Reply to this email directly, view it on GitHub https://github.com/kjvarga/sitemap_generator/issues/417, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXJEVGOYOJDLYMZFYTHQ3WHATNXANCNFSM6AAAAAARYT7ADY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
yes, when I copy paste link from sitemap index for exact sitemap, it works ok robots.txt are in root folder of web, also sitemap
you can check all here, all links works, robots, sitemaps https://www.openstats.city/robots.txt
Yes I'm able to access all the links fine as well. So I'm not sure what the issue is. The sitemaps look fine.
Please be aware that all your Invoice, Contracts and Orders data is publicly accessible!! e.g. https://www.openstats.city/en/invoices. you should secure that ASAP to prevent PII exposure, or worse. At the very least someone could use that data phish those users and get them to click malicious links, knowing details of their interactions with your site.
@kjvarga thanks, but thats ok, it is public data (opendata) and should be accessible to anybody to check where money flows. Thats all from opendata initiative to help with transparency how public money are spent
haha oops I was worried!
I think I saw this behavior as well. I re ran ping_search_engines
with the sitemap index and eventually they got marked correctly.
I our scenario I think it was related with the fact that our generated sitemap was huge and probably our cdn throttle some of the bot ips.
Also if you are redirecting to S3/Google via the app there might been app errors
I see something strange in my rails app log it looks that google is looking for gz version of sitemap even in index is linked non gziped version
I, [2022-11-15T19:48:10.902295 #1576179] INFO -- : [1671072e-6015-44ad-aa6c-914bc73f0917] Started GET "/sitemap12.xml.gz" for 66.249.75.241 at 2022-11-15 19:48:10 +0000
[1671072e-6015-44ad-aa6c-914bc73f0917] ActionController::RoutingError (No route matches [GET] "/sitemap12.xml.gz"):
I get couldn't fetch from gsc for sitemap files, but index is read fine. When I put my sitemap to any validator, it shows it is ok.
this is my sitemap.rb
what can I do to make it readable for google ?