Closed bart-turczynski closed 4 days ago
:tada: This issue has been resolved in version 4.0.0 :tada:
The release is available on:
Your semantic-release bot :package::rocket:
Hi @bart-turczynski sorry for the delay, just wanted to let you know that i have rewritten the library, it's performing much better than the earlier version and includes fixes you mentioned. thanks for bringing all of those important issues. hope it helps.
Thank you @muratgozel ! I love this library and I appreciate your fix, and more importantly, the fact that you coded the whole thing up to share with others. That's the spirit!
When a robots.txt file contains regular rules and sitemaps, everything works fine. However, issues arise when:
#
comments (1) if the comment appears somewhere in the body, we getthrow new Error('Each group or rule line must contain a colon.');
. If there's a comment at the very top, we getthrow new Error('Document must have at least one group starting with "user-agent" at the beginning.');
. (I work around this by preprocessing files, but perhaps it would be pertinent to take note of them in the results object).sitemap:
works fine, and sitemaps get added toadditional
information. However, if the parser encountershost:
, or any otherelement:
(e.g.,Clean-param: s /forum/index.php
) the results aren't reliable. Whatever shows up last in therobots.txt
file gets pushed toadditional
elements.crawl-delay:
seems to be ignored. I know all these aren't part of regular robots.txt files, but (1) it's good to know thecrawl-delay
for specific bots (Google ignores this, but well-behaved bots will abide), (2) comments are allowed in general, and often help better understand what goes on in the file (and will be ignored if the specific robot doesn't recognize it), (3) Yandex acceptsclean-param
values, it would be good to learn about them in the parser (much likehost
or any other element). Technically, bots will ignore elements they aren't familiar with, so I think it would be right to report on them since their presence doesn't break anything, and is valuable to people validating their files. What do you think about this?