Closed asmecher closed 4 months ago
I've switched to a link in my ui branch of OMP. I'll update it for OJS too, but as you discussed in email, we still need to propogate via URL.
OJS uses links for the language switcher (can't remember when this was implemented). But I think the issue of propagating the language within the URL is more in your wheel house. Assigning back to you unless you'd like me to look further.
Sure, I'll take a look.
Hmm. I don't like constructs that only show up for search engines, e.g. via user agents. So we're left with adding the language to system URLs in the general case, which I'm hesitant to impose on single-language journals; adding this as an optional mode could provide flexibility for both types of users, but switching between them might be catastrophic as all URLs would change. We could potentially have the URL generation code add a URL parameter for language, which would allow interoperability between the two modes -- but this would need to behave well with e.g. POST forms and Javascript, which might not be expecting URL parameters to suddenly get included. Deferring pending more consideration.
Greetings @asmecher How can I add language in the url on article detail page? For example my aim is to add additional parameter only for non-primary locale (Ukrainian in our case). The problem is that there is no other way for Google to index it...
I have already got some experience in PHP and Java EE, so hope if you guide me I could manage this problem. From where to start?
Hi @Vitaliy-1 -- the code for this is pretty much constrained to pages/article/ArticleHandler.inc.php
. PATH_INFO
URL components come in via the $args
parameter to each function. Have a look there and see if it makes sense -- let me know if you get stuck somewhere specific.
Thanks for reply @asmecher ,
Hmm, $args
is an array, that from my point of view contains only article id. $request
is an Request Object from which I can, for example, retrieve URL, redirect request, but not to change it somehow. PATH_INFO
can be seen in context of $_SERVER
array. Do not see the way to modify URL here. I am missing something...
Can you show me an example of URL mapping?
I know that view
function (method of this Handler class) is crucial for displaying article landing page. It is responsible for the view
part of URL. How is it possible to change it from view/
to view/uk/
. Or maybe to work with the last part of URL, article id, is better? Where actually the latter is come from? I though from articleid
variable but changing it not make any effect...
So, I am thinking about something like:
$currentLocale = AppLocale::getLocale();
$defaultLocale = AppLocale::getPrimaryLocale();
if ($currentLocale != $defaultLocale) {
$addToUrl = substr($currentLocale, 3, 2);
//add $addToUrl to Url
}
Maybe just create new page with this url pattern and redirect like this to it. But in Java it is possible to map one servlet to several url patterns. I am confused.
Hi again, @asmecher
It's not easy without much experience in programming to read and understand others` code. But I know that you haven't got much time for helping others to write the code.
After browsing classes I found PKPPageRouter
class and its method route
https://github.com/pkp/pkp-lib/blob/master/classes/core/PKPPageRouter.inc.php#L146
Suppose it picks up entered by user url and associates with specific ojs file.
There is a hook inside called LoadHandler
which carries 3 variables. $page and $op seems to represent parameters from url and $sourceFile represents path to smarty template (I hope).
I have created a mockup of a plugin here to manage this hook: https://github.com/Vitaliy-1/localeRedirect/blob/master/LocaleRedirectPlugin.inc.php
Can you confirm that I am on the right path? Or you wouldn't use this hook for specified earlier task?
Another approach, that I found, is to modify initialize
function inside ArticleHandler
class. As an quick example, with what planning to work:
function initialize($request, $args) {
if ($args[0] == "uk_UA") {
$articleId = isset($args[1]) ? $args[1] : 0;
$galleyId = isset($args[2]) ? $args[2] : 0;
$request->getSession()->setSessionVar("currentLocale", "uk_UA");
} else {
$articleId = isset($args[0]) ? $args[0] : 0;
$galleyId = isset($args[1]) ? $args[1] : 0;
}
// original code here
return $request
}
So the question remains what approach is better in your opinion? Or non of them? And will google actually see that page for selected locale?
@Vitaliy-1, my worry is about ambiguity in URLs. If I'm reading correctly, this would result "equivalent" URLs like...
.../article/view/uk_UA/smecher17/pdf
.../article/view/smecher17/pdf
However, that last one could be read two ways: a galley view with article ID "smecher17", galley ID "pdf", or an article view with locale "smecher17" and article ID "pdf". We can code around it here but there will be lots of knock-on complication, e.g. in parsing URLs for statistics calcuations in the log files.
I think it's definitely necessary to...
What about using an optional URL parameter, e.g.: .../article/view/smecher17/pdf?locale=uk_UA
? It's not as pretty as your proposal, but isn't ambiguous, and it should be clear to readers how it'll behave. To facilitate indexing, I would think the only additional thing that's needed is better linking to different-language versions, in the front end and probably also in meta
content.
Greetings @asmecher
While writing the code I have encountered a problem with language toggle. As an example of changing locale:
$_SESSION["currentLocale"] = "en_US";
or
$request->getSession()->setSessionVar("currentLocale", "en_US");
The lines above are changing actual locale text only on any second request (but session locale is changing immediately). Only way that I found includes:
$request->redirectUrl(...);
Is there more clear way?
Ahh, The problem can be managed by assigning values inside constructor of SessionManager
class. Obviously session values can't be changed if already assigned, isn't it?
@Vitaliy-1, rather than working via session parameters, I'd suggest adding a facility to the AppLocale
class that permits setting the locale, rather than just getting it. This would involve moving the $currentLocale
variable there out into the class, and adding a new setLocale
function.
Thanks for guidance @asmecher
There is another one problem, after applying modifications as per your advice.
The problem is that locale from all plugins don't want to change immediately after using setLocale
method. They need session refreshment. But core locale is updating accordingly.
My AppLocale class: https://github.com/Vitaliy-1/AppLocale/blob/master/AppLocale.inc.php
This how I call setLocale
method from a plugin: https://github.com/Vitaliy-1/localeRedirect/blob/master/LocaleRedirectPlugin.inc.php#L41
Hi @asmecher
I have managed to make a separate URL for non-primary locale. After looking over several options and reading google guidelines about multilanguage sites I pick up a variant with separate subdomain. It has no conflicts with main code, OJS picks requests to subdomains without a need to pointing them in the apache configuration files. Only subdomain registering is needed. Have checked on the production system and it works fine with already started and new user sessions. One problem was to make a switcher on a admin dashboard side, because standard tools for routing current location weren't working in usernav.tpl (as it is not actually a page), but it was managed with HTTP_REFERER
and bit of regex.
But I wasn't able to code an appropriate setter for AppLocale class, so I have done the modification for SubmissionManager class - setting the currentLocale var for user session depending on presence of subdomain in URL.
Do you actually need this sort of a plugin for public use? If so, how can I manage a setter for changing languages?
Hi @asmecher So what about the idea to give separate subdomains for non-primary locales? We have successfully tested it for several months, and there weren't any disruptions in publication, indexing or XML exporting processes.
@Vitaliy-1, sorry I haven't been following this as closely as I'd like. Subdomains would certainly solve the problem for some, though it's probably not a general-purpose enough solution for everyone (thinking e.g. of the many users who don't have their own domains or lack expertise in setting up subdomains). Can you summarize what was required to set this up (e.g. patches etc)?
Just dropping this here although it does include some obvious things: https://support.google.com/webmasters/answer/182192, most important part in the end.
Some suggestions with two locale journal. Default is English and secondary is Deutsch. Basically the default locale would also work if a locale existed in the URI, but would result into a redirect as suggested by Google in the above document. For claritys sake I am not showing the index.php part which many sites hide anyway.
Main site (or do we need these for the main site?)
Journal index:
Single article:
(edit: how come nobody has registered site.com?)
Hi @asmecher
Nothing really special. Most of the modifications were done inside SessionManager
class. Really wanted to add method inside AppLocale
class, but encountered with problems, described above.
My new static method:
private function subdomainLocaleRedirect(PKPRequest $request)
{
$domainLocalePointer = explode(".", $_SERVER['HTTP_HOST'])[0];
$journal = $request->getJournal();
$site = $request->getSite();
// get supported locales and primary locale
if ($journal != null) {
$locales = $journal->getSupportedLocaleNames();
$primaryLocale = $journal->getPrimaryLocale();
} else {
$locales = $site->getSupportedLocaleNames();
$primaryLocale = $site->getPrimaryLocale();
}
// make an array where key is 2 first chars from supported locale and values - corresponding locale name
foreach ($locales as $key => $supportedLocale) {
if ($key != $primaryLocale) {
$supportedLocalesforDomain[substr($key, 0, 2)] = $key;
}
}
if (!isset($supportedLocalesforDomain)) return false;
if ($this->userSession != null) {
foreach ($supportedLocalesforDomain as $domainKey => $localeValue) {
if ($domainLocalePointer != $domainKey && $this->userSession->getSessionVar("currentLocale") != $primaryLocale) {
$this->userSession->setSessionVar("currentLocale", $primaryLocale);
} elseif ($domainLocalePointer == $domainKey && $this->userSession->getSessionVar("currentLocale") != $localeValue) {
$this->userSession->setSessionVar("currentLocale", $localeValue);
}
}
}
}
line 68 after $now = time();
:
$this->subdomainLocaleRedirect($request);
additional lines in case if user cookies not set. Need to be rewrited to retrieve actually installed locales. I have put it after a creation of a new session, after this line: $this->userSession->setSecondsLastUsed($now);
:
$domainLocalePointer = explode(".", $_SERVER['HTTP_HOST'])[0];
if ($domainLocalePointer == "uk") {
$this->userSession->setSessionVar("currentLocale", "uk_UA");
} else {
$this->userSession->setSessionVar("currentLocale", "en_US");
}
There is no need to add subdomain on a server level. OJS will serve these requests appropriately.
As I am not a programmer, think this code could be optimized :)
@Vitaliy-1 Was this issue ever pushed to OJS 2? Because I'm not getting all languages indexed in my installation either.
Think no. As I wrote here it is complex problem. Because crawlers are indexing only one locale per URL, there is a need to change the URL to every non-primary locale. I suppose, the best thing here is to give additional query string; and primary locale always should have default URL and be associated with indexing (DOI, PMID, OAI etc.)
You mean as a parameter or as an actual part of path? I tried your solution with SessionManager but it gives me a "too many redirects" error. That could be due to the fact I already have some redirects based on the country of visitor, though...
Suppose, as a parameter would be easier and I saw that this was implemented in minimum one OJS 2 journal. If you already have redirects its quite possible you need to rewrite that part of code. Keep in mind that approach with subdomain probably will require registering a subdomain.
Yeah, subdomain would actually be more difficult in my scenario, not because of the registering but because we also have an OMP installation and we plan on having a Wordpress installation as a base directory, all on the same domain.
In any case, I'll try doing a paramater and will report back here with how it goes.
@asmecher
What about using an optional URL parameter, e.g.: .../article/view/smecher17/pdf?locale=uk_UA? It's not as pretty as your proposal, but isn't ambiguous, and it should be clear to readers how it'll behave. To facilitate indexing, I would think the only additional thing that's needed is better linking to different-language versions, in the front end and probably also in meta content.
How should I go about doing this? In which file would it best to make these changes? I tried doing it in the header.tpl of a theme, but I get all sorts of errors (probably because I'm using RESTful URLs?)
I'm using 2.4.8.3, for reference.
@jmvezic I suppose you need to intercept web request on a higher level in the class that is responsible for handling them. In OJS 3 I worked with SessionManager class. Although, maybe this can be done through a plugin with an appropriate hook (I'm not sure). I modified templates only for pointing the right links to locales' pages.
@Vitaliy-1 I suppose I could have a go at it, I haven't interfered with classes yet for fear of breaking something. You think Google would "catch" the parameter if it was added dynamically through SessionManager?
I'm seeing it like this. This all can be put in PHP. Here you need to work with AppLocale class (in OJS3) and URL string from the request.
Google certainly will cache any parameter that you add to URL.
Okay, so I've made a bit of hack which could work in my case, I hope. I've added, in the article/header.tpl
file of my theme the following code:
{php}
$AppLocale = new AppLocale();
$Locale = $AppLocale->getLocale();
if(!isset($_GET["lang"])){
header('Location: '."$_SERVER[REQUEST_URI]?lang=$Locale");die();
}
else {
if($_GET["lang"]!=$Locale){
header('Location: '.strtok($_SERVER["REQUEST_URI"],'?')."?lang=$Locale");die();
}
}
{/php}
What that does is redirect the article view page to the URL which ends with ?lang=en_US
, for example http://www.site.com/journal/article/view/1402?lang=en_US
. In case the user (or Googlebot for that matter) changes the language via the language picker, the parameter changes as well.
Here's hoping that Google will now see this as two seperate URLs and index both.
The same thing could be done for the journal index page as well, I presume.
Obviously this is a pretty dirty solution, and I've yet to see if it's going to work. If it works, I'll update here in case anyone needs a quick solution until a prettier/more global one arrives.
So a little update: it seems that the above solution isn't working, or rather, Google doesn't follow the language switch redirect. It just throws a redirect error and says "excluded".
while reading issue #7272 , it occurred to me the present issue (basically, offering language-specific article URLs) could perhaps be implemented in OJS3 as a theme. the information needed seems to have already been made available in the "smarty" template API:
(string) $currentLocale is the locale (language) the site is currently being viewed in. You’ll find an array of supported locales at
$supportedLocales
.
https://docs.pkp.sfu.ca/pkp-theming-guide/en/template-variables#site-journal-and-locale
the above could be used in conjunction with the "currentUrl
" variable to extract an input URL GET parameter for the language code (e.g., hl=en):
https://docs.pkp.sfu.ca/pkp-theming-guide/en/advanced-custom-data
PS: maybe the present issue #699 should be renamed to something more specific, like "Offer language-specific article URLs", as web indexing is a broader issue, with other potential solutions, such as showing multilingual metadata on the same page #7272.
leaving here some relevant guidelines for SEO:
Use different URLs for different language versions:
Google recommends using different URLs for each language version of a page rather than using cookies or browser settings to adjust the content language on the page.
If you use different URLs for different languages, use hreflang annotations to help Google search results link to the correct language version of a page.
Use the x-default tag for unmatched languages:
The reserved hreflang="x-default" value is used when no other language/region matches the user's browser setting. This value is optional, but recommended, as a way for you to control the page when no languages match. A good use is to target your site's homepage where there is a clickable map that enables the user to select their country.
Here is the HTML that would be in the
section of all the pages listed above. It would direct US, UK, generic English speakers, and German speakers to localized pages, and all others to a generic homepage. Google Search returns the appropriate result for the user, according to their browser setting<head> <title>Widgets, Inc</title> <link rel="alternate" hreflang="en-gb" href="http://en-gb.example.com/page.html" /> <link rel="alternate" hreflang="en-us" href="http://en-us.example.com/page.html" /> <link rel="alternate" hreflang="en" href="http://en.example.com/page.html" /> <link rel="alternate" hreflang="de" href="http://de.example.com/page.html" /> <link rel="alternate" hreflang="x-default" href="http://www.example.com/" /> </head>
Mistakes to Avoid when Auto-redirecting:
- Use separate redirector pages solely for redirecting. Use 1 redirector page for each set of internationalized pages. In the example above,
http://www.example.com/product
is the redirector page for the set of 3 pageshttp://www.example.com/en/product.html
,/fr/product.html
and/es/product.html
. (...)- Never automatically redirect a visitor (human or bot) that is trying to access a specific language version page that has content. In our example, that means never auto-redirecting when the page requested is one of
http://www.example.com/en/product.html
,/fr/product.html
or/es/product.html
The currently active language is stored in a cookie on the user's device and not in the URL. There are no URL's available for crawlers to follow while indexing the content. As a result, a search engine crawler can only index the article metadata using one language. The same applies to other content in the journal's homepage, for example the About the Journal section. The aim is to enable crawlers to find and index multilingual content in the journal homepage. This includes both a) multilingual article and issue metadata and b) other text content on the journal homepage.
Q: A:
PRs: PKP: https://github.com/pkp/pkp-lib/pull/9628 OJS: https://github.com/pkp/ojs/pull/4146 OMP: https://github.com/pkp/omp/pull/1545 OPS: https://github.com/pkp/ops/pull/659 Crossref-ojs: https://github.com/pkp/crossref-ojs/pull/47 Crossref-ops: https://github.com/pkp/crossref-ops/pull/37 CitationStyleLanguage: https://github.com/pkp/citationStyleLanguage/pull/119 GoogleScholar: https://github.com/pkp/googleScholar/pull/19
Fix 1, s. comment https://github.com/pkp/pkp-lib/issues/699#issuecomment-2083802046 below: PRs:
Fix 2 (in monolingual contexts, if one adds locale in the URL, then the URL changes to something like this context//something): PRs:
@jonasraoni, I would have a question regarding the WebFeed plugin: currently, with this implementation, the URLs in atom, rss, and rss2 would have the UI language in the URLs, e.g. in atom:
<id>http://ojs-dev.bb/index.php/publicknowledge/fr_CA/gateway/plugin/WebFeedGatewayPlugin/atom</id>
...
<link rel="alternate" href="http://ojs-dev.bb/index.php/publicknowledge/fr_CA" />
<link rel="self" type="application/atom+xml" href="http://ojs-dev.bb/index.php/publicknowledge/fr_CA/gateway/plugin/WebFeedGatewayPlugin/atom" />
...
<link rel="alternate" href="http://ojs-dev.bb/index.php/publicknowledge/fr_CA/article/view/17" />
<summary type="html" xml:base="http://ojs-dev.bb/index.php/publicknowledge/fr_CA/article/view/17">
This seems to be OK for me -- all data is presented localized, and a user could so maybe choose in which language he/she would like to read/get the feeds. Only one thing that we maybe need to change in that case is language element in rss and rss2 -- it always shows the journal primary language. Do you think this could/should be then also changed to the UI language? -- I am not 100% sure what the language element needs to contain... Or, do you think we should for a reason rather keep the old, normal URLs, without the UI language in them? :thinking: Thanks a lot!
@bozana We can use the same format that we use on the <html>
tag (source: https://www.rssboard.org/rss-language-codes), and the ATOM <feed>
tag also supports the xml:lang="en"
attribute.
I didn't check the PRs, but it's good to ensure that old links are being properly redirected.
The old url's work and these are and should be used when article metadata is exported somewhere, like for example Crossref and DOAJ or shown in OAI-PMH, because we of course can not know how the journal changes their settings. RSS feeds are probably, like Bozana is thinking, different in this regard.
Hi @jonasraoni, as Antti-Jussi said, the old links will work. However, the new WebFeed URLs will contain the UI language, as in the example above. According to the https://www.rssboard.org/rss-language-codes:
The language employed in an RSS feed can be indicated in the language element,...
the language element should then also contain the UI language in the format ISO 639-1.
EDIT: The issue that should address this: https://github.com/pkp/pkp-lib/issues/9910
Hi @jyhein, I took a look into the code once again and it looks good. Just that OMP and OPS are missing one change -- I left a comment in the PRs. Regarding ORCID: Because it is currently being moved into the core, could you only provide the links to you changes in this issue, so that @ewhanson can consider them there: https://github.com/pkp/pkp-lib/issues/9771. Else, you do not need to link to them in your PRs here. Then, you can rebase everything (also the plugin submodules), create PRs for plugin submodules/repositories (and link to the PRs here in this issue above), and consider all submodules (pkp, but also every plugin submodule) in the last commit. Then, when the tests pass we can merge... :-) Thanks a lot!
Hi @jyhein (and maybe @ajnyga), what about sitemap -- does it need to contain all languages? -- s. https://developers.google.com/search/docs/specialty/international/localized-versions.
My thinking was that the sitemap would guide to the primary language (via the link without the language code) and each page would have further information for search engines in the page header.
But of course adding the links to that sitemap would be doable. Just leads to a massive sitemap of course in some cases.
Yes, lets leave it as it is for now... Also, as @jyhein said, it seems, only one way from 3 listed in that Google page needs to be supported... Thanks a lot!
All merged, thanks a lot!
@bozana / @jyhein, I'm re-opening this because it breaks my installation (specifically https://github.com/pkp/pkp-lib/pull/9628). My local OJS is installed to http://localhost/git/ojs-main
, and a typical URL into OJS is http://localhost/git/ojs-main/index.php/publicknowledge/article/view/mwandenga-signalling-theory
.
With the PR applied, the path gets mixed into the path_info
data. Going to http://localhost/git/ojs-main
redirects me to http://localhost/git/ojs-main/index.php/git/ojs-main
, and going to http://localhost/git/ojs-main/index.php/publicknowledge/article/view/mwandenga-signalling-theory
redirects me to http://localhost/git/ojs-main/index.php/git/ojs-main/publicknowledge/article/view/mwandenga-signalling-theory
.
The /git/ojs-main
part after the index.php
should not be there -- it's the installation directory and is already there before the index.php
wrapper.
Can you test with the case where OJS is not installed in the server's root directory?
@asmecher, I have just merged the fix that @jyhein provided, so your installation should work correctly with the new code... :-)
That works -- thanks, @jyhein and @bozana!
Describe the problem you would like to solve Users can switch languages while reading a multilingual journal. However, the currently active language is stored in a cookie on the user's device and not in the URL. As a result, a search engine crawler can not index information in languages other than the journal's primary language.
Describe the solution you'd like No consensus has been reached on a proposed solution.
Who is asking for this feature? Multilingual journals that want to be indexed by Google (not Google Scholar).
Additional information See http://forum.pkp.sfu.ca/t/keep-ui-archivable-by-heritrix-web-crawler/3207/6 for details.