hexojs / hexo-generator-sitemap

Sitemap generator for Hexo.
https://hexo.io
MIT License
289 stars 53 forks source link

The generate sitemap <lastmod> is wrong(inaccuracy)! #92

Closed DeppWang closed 3 years ago

DeppWang commented 4 years ago

The hexo-generator-sitemap plugins generated sitemap.xml that <lastmod> corresponding the last modification time of the source folder file, but the <lastmod> should corresponding the last modification time of the html file.

the sitemap.xml example:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
  <url>
    <loc>http://www.example.com/foo</loc>
    <lastmod>2018-06-04T18:00:15+00:00</lastmod> <!--The time last modification of the `foo.html`.-->
  </url>
</urlset>

I found [GSC Help] have a tips:

Google reads the <lastmod> value, but if you misrepresent this value, we will stop reading it.

If the sitemap <lastmod> is wrong, meaning the sitemap not effect?

About lastmod


append in 2020.04.16:

Because my sitemap cann't be read by GSC, I guess may be caused by the wrong <lastmod>.

image

It cause my article cann't be indexed in time. Almost need a week!

Your sitemap can be read in GSC? Your posted article can be searchable in next day?

If your sitemap can be read, and the <lastmod> don't inconsistent, So the <lastmod> don't matter in sitemap!

If your sitemap can not be read, but your posted article can be searchable in next day. Please reply me, I need your help.

flashlab commented 4 years ago

you can modify sitemap.xml or create external template as your own demand. Like this: post.updated.toISOString().substring(0, 10) Hope it will help you

SukkaW commented 4 years ago

@flashlab @DeppWang

If you guys are sure there is something wrong with lastmod, feel free to open a new pull request.

DeppWang commented 4 years ago

@flashlab Thanks, ask a question: Your sitemap can be read in GSC?

flashlab commented 4 years ago

@DeppWang @SukkaW tested myself Please check #94

DeppWang commented 4 years ago

@SukkaW I found your article can be indexed in time. Ask a question: Your sitemap can be read in GSC?

DeppWang commented 4 years ago

@flashlab Sorry, my eyes deceive me.

The following words reserved:

If sitemap can be read in GSC, the <lastmod> is not a issue.

If <lastmod> is a issue. Can't to change <lastmod> solove it, need to get html file last modification time, or remove <lastmod>.

curbengh commented 4 years ago

2018-06-04T18:00:15+00:00 does conform to the W3C Datetime/ISO 8601 format as specified by the protocol.

Edit: I tested 2018-06-04T18:00:15+00:00 with Yandex's sitemap validator and it is valid.

Perhaps Google expected/perfer 2018-06-04T18:00:15.000Z which should be the format generated by this plugin.

Can I have your Node version and "package.json"? Output of $ node -v and $ npm ls --depth 0?

In the meantime, I'll check my GSC.


It cause my article cann't be indexed in time. Almost need a week!

May I know when did your first add your site to GSC?

If I remember correctly, GSC (and other web crawlers) don't re-crawl new site as frequently, after a while if the crawler notice the site has consistent update, only then it will update more often.

DeppWang commented 4 years ago

@curbengh , Thanks your reply.

  1. 2018-06-04T18:00:15+00:00 was copied from protocol. My sitemap.xml <lastmod> format is like 2018-06-04T18:00:15.000Z. I didn't notcie the difference of the both(a mistake). I rather express the difference between the local md file last modified time and online html file last modified time. May be the difference don't matter. My sitemap.xml: https://depp.wang/sitemap.xml. After test: Even though I remove the <lastmod> of sitemap.xml, the GSC also can't read my sitemap.xml.

  2. I use hexo-aciton deploy blog, this is the hexo env. I don't think it's related to env.

    image-20200630215737914

  3. I first added my site to GSC in 2017.

I guess it may be that the weight of my site is too low, so GSC does not read my sitemap.xml. Now, Google Spider should indexes my articles through the homepage, so it very slow.

Your sitemap.xml can be read in time?

curbengh commented 4 years ago

Your sitemap.xml can be read in time?

Roughly every 4 days. I added it last year.

My sitemap.xml: https://depp.wang/sitemap.xml

I notice all <lastmod> values are the same, which I think is the deployment time. Perhaps GSC finds it unusual and ignore it.

I have two suggestions (you can try all):

  1. Use post.date only, this requires custom template.
    • Add the following content to "source/.sitemap.xml":
      <?xml version="1.0" encoding="UTF-8"?>
      <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
      {% for post in posts %}
      <url>
      <loc>{{ post.permalink | uriencode }}</loc>
      {% if post.date %}
      <lastmod>{{ post.date.toISOString().substring(0, 10) }}</lastmod>
      {% endif %}
      </url>
      {% endfor %}
      </urlset>
    • Configure this plugin to use the above sitemap:
      _config.yml
      sitemap:
      template: ./source/.sitemap.xml
  2. GSC also supports RSS. Install hexo-generator-feed and add https://depp.wang/atom.xml to GSC. Do not remove sitemap from GSC.
    • When you first add this, GSC might show "fetch error", wait for a day (don't remove), it will read just fine.
DeppWang commented 4 years ago
  1. This step I had done. Because I use github action deploy blog, the action env will clone all files again each time. so the lastmod are the same. There is no good way to solve this issue.
  2. I had added atom.xml to GSC, follow your suggestion. GSC now show "Couldn't fetch", may be as your said, GSC need time.

Thinks your suggestions.

curbengh commented 4 years ago

the action env will clone all files again each time. so the lastmod are the same.

My suggested template doesn't use post.updated; it only uses post.date (derived from a post's front matter) so "lastmod" should be unique.

DeppWang commented 4 years ago

I didn't notice the details, I have updated it.

Thanks very much!

curbengh commented 3 years ago

https://github.com/hexojs/hexo-generator-sitemap/pull/94, released in 2.1.0.

DeppWang commented 3 years ago

@curbengh Hi, I am coming again 😂, same old problem. atom.xml still "Coun't fetch" .

image

GSC said they can use the Atom 1.0 or RSS 2.0 as a sitemap, My atom.xml is Atom 1.0, I check the format is right. It is possible that I use the Cloudfare as a DNS causing it? I have no idea. I try to feedback to google support, but no reply.

What direction shou I take to check for error? Can you give me a suggestion, thx.

curbengh commented 3 years ago

I use Cloudflare as well and my atom.xml issue resolved itself within a few days. Can you access to http://yoursite.com/atom.xml.gz? Alternative is to switch to RSS format and try again.

DeppWang commented 3 years ago

My https://depp.wang/atom.xml.gz can't access, your site atom.xml.gz can download the xml. I will try to use the RSS, hope it can useful. Thanks for your reply.