Closed OwenMelbz closed 3 years ago
Can you ensure that the site has a base URL, and ensure that base URL has no sub-directory?
https://github.com/nystudio107/craft-seomatic/blob/v3/src/services/FrontendTemplates.php#L90
Hi,
The site does have a base URL, however it will have a sub directory of the language segment.
e.g. "website.com/en-gb/"
The form itself uses an alias
Right. So as per the spec, robots.txt
files don't appear in sub-directories. They will only ever appear in the root domain, as you noted here:
https://github.com/nystudio107/craft-seomatic/issues/859#issuecomment-811984524
In your case, you do not have a site that doesn't have a sub-directory as part of the URL, so robots.txt
does not appear.
Sigh... why close so abruptly?
The robots.txt, humans etc should live in the root directory... currently its not, that's the problem its 404ing after these changes.
Language segments are completely irrelevant for multi-site installs - regardless of what they are set to, the robots.txt should always live within the root.
Thus the issue.... the robots.txt is 404ing, when it should be in the root of the domain, regardless of what Base URLs are set up for multilingual functionality.
This update has been marked as a "minor" release right - however it's just broken every Craft site that uses language segments for their pages rather than just a naked domain for their site.
In your case, you do not have a site that doesn't have a sub-directory as part of the URL, so robots.txt does not appear.
At what point does this become a suitable solution? To just remove the root robots.txt from everybody that is using language segments? As you've pointed out - it needs to be in the root. So why remove it? It's fine to remove it from sub directories, but this has adversely removed it from the root as well which is not a desired outcome.
You have no Craft site that is responding to the bare root directory. Therefor SEOmatic will never see the requests coming in anyway -- I'm assuming you have some kind of server-side redirect that redirects from domain.com
to whatever your default is, e.g. domain.com/en/
What exactly happens to requests for domain.com
(the bare domain) in your setups?
It's not a matter of "removing" it from the root domain, it's a matter of no root domain existing at all (at least as far as Craft is concerned).
I think what could be done in this situation is if the request comes in with no site assigned, I can register this template anyway... let me test.
The problem is, Craft defaults to your primary site if there is no current site set for the current request (which would be the case in your setup):
/**
* Returns the current site.
*
* @return Site the current site
* @throws SiteNotFoundException if no sites exist
*/
public function getCurrentSite(): Site
{
if ($this->_currentSite !== null) {
return $this->_currentSite;
}
// Default to the primary site
return $this->_currentSite = $this->getPrimarySite();
}
Hi sorry if we're coming across arsy, but as a customer who has paid thousands of pounds for your product, we expected less dismissal than we've received which has been really frustrating as it's currently broken these txt files on 4 of our clients websites, so you must appreciate our frustration as this was a direct outcome from a change made in Seomatic which was flagged as non-breaking and we trusted that.
Craft will load the "primary" site when you access it without a language segment without any issues - this is the default behavior of it out of the box.
We'd expect SEOmatic to follow the same principles - the code runs getCurrentSite()
which returns a completely valid result as far as Craft is concerned.
Regarding the .com
to .com/en
redirect we simply have some middleware loaded into the request cycle that checks for a language segment, if it doesn't find one it redirects to the same URL but with the primary site baseURL injected at the start of it. This allows any pre-registered URLs to function without a language segment.
We don't mind having to define an optional flag to opt-in to the new behaviour, as long as we've got something that will allow our clients sites to get back to the state they were in before.
Thanks
Craft will load the "primary" site when you access it without a language segment without any issues - this is the default behavior of it out of the box.
We'd expect SEOmatic to follow the same principles - the code runs getCurrentSite() which returns a completely valid result as far as Craft is concerned.
Yes, the problem is that your primary site that is returned will have a sub-directory as part of the path, which is what we're keying off of to not render the robots.txt
files, to be in line with the spec, and satisfy https://github.com/nystudio107/craft-seomatic/issues/859
I'm reverse-engineering Craft's code at the moment to see exactly where and how it is setting the current site, to see if there is a way to special-case requests that are coming in that are not assigned to any site.
The problem is that as noted above, Sites::getCurrentSite()
defaults to your primary site if no site matches the request, which is problematic for what we're trying to do here because we have no apparent way to tell if this is a legitimate request for that site, or if it's just defaulting to it.
Would it be possible to simply add a config item like ignore_sub_directories
and let the user choose - I'd go with defaulting to false
- to preserve existing users functionality, then allow users to opt in to the alternative functionality.
Addressed in: https://github.com/nystudio107/craft-seomatic/commit/0aa17f62210e17c916997aae11e00544ca9d95f0
You can try it now by setting your semver in your composer.json
to look like this:
"nystudio107/craft-seomatic": "dev-develop as 3.3.38”,
Then do a composer clear-cache && composer update
👌 looks like it's done the job, have tested on 2 different sites and they both are working as before thanks :) much appreciated - will await full release to double check!
Thanks
Great! Will likely cut a release today.
Released in 3.3.38: https://github.com/nystudio107/craft-seomatic/releases/tag/3.3.38
Describe the bug
The robots.txt no longer works on (at least) multi-site installs
UPDATE: This also seems to affect the
ads.txt
andhumans.txt
.To reproduce
Steps to reproduce the behaviour:
Expected behaviour
The robots.txt powered via teh CMS should display.
Screenshots
Here's the stack trace from when landing on the /robots.txt
The web-404.log after making a single request to the robots.txt
Versions
Application Info
PHP version 7.4.14 OS version Darwin 19.6.0 Database driver & version MySQL 5.7.31 Image driver & version GD 7.4.14 Craft edition & version Craft Pro 3.6.11.2 Yii version 2.0.41.1 Twig version v2.14.4 Guzzle version 6.5.5
Plugins
SEO 3.3.37
Modules
No modules are installed.