Closed mkurz closed 5 months ago
It's working in
but not the newly added 2.9.x and 3.0.x docs:
Pretty sure the Algolia index needs to be configured to also crawl those two new "subfolders". I am in contact with Lightbend already since they still manage the algolia account. They will take a look. Also asking them to transfer the algolia index to us... We will see.
Any news?
Yeah sorry for the long delay, I am working on it, just algolia support takes some time (also I was not pushing hard enough...). I wrote them once more, I am also in contact with Lightbend, to finally hand over the Algolia account but things take a bit longer then desired (they couldn't find the credentials...). But we are getting there slowly. I hope to resolve this the next days, really :crossed_fingers:
Just to let you know I am working on this again, I need to set up a new algolia account because we can not retrieve the old one and support can not help us (or not willing to...)
So, I finally figured all the things out, spend basically the last two days reading Algolia docs, contacting even more people, trying various things with different docsearch version and crawler versions (because Play currently uses a very old legacy Algolia implementation which actually needed to be upgraded etc)... it was a bit complicated, but I can replicate the search now again locally already. I am an Algolia expert now... I am too tired now, but will post more about that tomorrw (and also will finally make the search work again, future proof)
Basically crawling and searching is working again (you can try in https://www.playframework.com/documentation/3.0.x/Home already, also old docs work fine still, like https://www.playframework.com/documentation/2.0.1/Home or with the x - https://www.playframework.com/documentation/2.0.x/Home).
Just three issues left which I just found in production now (because I was always testing with 3.0.x locally):
Currently only patch releases up to (including) version [digit].[digit].9 are crawled, but patch releases starting with 10+ are not crawled, because the regex does not match, guess [2-9].\\d.(\\d|x)
has to be changed to [2-9].\\d+.(\\d+|x)
(see https://www.playframework.com/documentation/2.8.11/Home - search does not work there, I checked the index, it just has versions up until x.x.9). Will do that before starting the next crawl and fixing belows issues.
Funny thing is, all the versions get crawled, except the latest releases (see 3.0.2) :facepalm:, that is because the entry point https://www.playframework.com/releases only contains previous releases, but not the current ones... To solve this I will add a hidden link ~to each docs Home page in https://www.playframework.com/changelog as well, because it also includes the latest releases, and also make this page an entry point for crawling~ to the latest release(s) in the "all releases" page, see https://github.com/playframework/playframework.com/pull/584
Look at https://www.playframework.com/documentation/2.8.6/PlayEnhancer, you will see, just above the seach box
"You are viewing the documentation for the 2.8.6 release in the 2.8.x series of releases. The latest stable release series is 3.0.x."
Now, the PlayEnhancer
page got removed in the latest 2.8.x versions already, so the 2.8.x and 3.0.x links will 404. This is not really bad, however the Algolia crawler will display many 404 errors for such pages. I think about to not render this message when the Algolia search bot visits this page, to avoid cluttering the Algolia logs with this errors, so we only see "real" errors that might need to be handled. (btw. this is different than removing elements from the DOM in the recordExtractor
function, because the recordExtractor
gets executed after the page was crawled and all the links already got extracted, so using $(..).remove()
comes too late here. DOM manipulation in the recordExtractor function is only good to avoid including certain elements into the records that make it into the index). Update: See https://github.com/playframework/playframework.com/pull/585
Done, search is working now again for all versions. Also, no crawl errors occured anymore. Bonus: New releases will be crawled automatically, because https://www.playframework.com/releases is now the entry point for crawling, which has hidden links to all versions. Algolia crawler visits once per week on Tuesday. Crawling and indexing all pages of all versions now takes ~ 2 hours and 20 minutes.
Finally we now have control with our own Algolia account. RIP old Algolia account that no one is able to restore anymore :coffin: (including Algolia staff...)
:cry: