mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Move Lucene web site from svn to git [LUCENE-8987] #984

Closed mikemccand closed 4 years ago

mikemccand commented 5 years ago

INFRA just enabled a new way of configuring website build from a git branch, see dev list email. It allows for automatic builds of both staging and production site, much like the old CMS. We can choose to auto publish the html content of an output/ folder, or to have a bot build the site using Pelican from a content/ folder.

The goal of this issue is to explore how this can be done for http://lucene.apache.org by, by creating a new git repo lucene-site, copy over the site from svn, see if it can be "Pelicanized" easily and then test staging. Benefits are that more people will be able to edit the web site and we can take PRs from the public (with GitHub preview of pages).

Non-goals:


Legacy Jira details

LUCENE-8987 by Jan Høydahl (@janhoy) on Sep 22 2019, resolved Mar 09 2020 Attachments: lucene-site-repo.png Linked issues:

Sub-tasks:

mikemccand commented 4 years ago

Steps

  1. Create new git repo 'lucene-site'
  2. Create folder structure and copy old site (excluding JavaDoc and online RefGuide) from svn into appropriate folder(s)
  3. Adapt to make local Pelican site build work for building the barebones site, and commit to master branch
  4. Add .asf.yaml file with a 'staging' profile for branch asf-staging, and a 'publish' profile for branch 'asf-site', and a 'pelican' directive to auto build from 'master' branch and put site into 'asf-staging' branch (/output folder).
  5. Verify that the staging build kicks off and that a site appears in lucene.staged.apache.org (note that this is different from lucene.staging.apache.org that old CMS uses)
  6. Find a solution for JavaDoc and RefGuide, which are huge amounts of statically generated HTML uploaded by RM during build. ** These should just be put on a filesystem somewhere, outside of git ** Do some .htaccess magic to make them appear in the right locations of the site
  7. Once the staging site is good, merge asf-staging into asf-site branch to publish. This will automatically disable CMS.
  8. Commit a README-NOT-IN-USE file to old svn repo and make it read-only

Note that also the RM guidelines need to be updated wrt

[Legacy Jira: Jan Høydahl (@janhoy) on Oct 16 2019]

mikemccand commented 4 years ago

lucene-site-repo.png

[Legacy Jira: Jan Høydahl (@janhoy) on Oct 16 2019]

mikemccand commented 4 years ago

@janhoy  Do you need any help with this issue? I'm interested in getting started contributing to Lucene, and the docs seem like a good place to get familiar with the community.

[Legacy Jira: Adam Walz on Oct 22 2019]

mikemccand commented 4 years ago

Sure, all help welcome! This issue is about migrating the site from svn and an in-house static site builder over to git and a new supported static site builder. Feel free to engage in the discussions any of the sub tasks. It should be possible to do code contributions through PRs towards the lucene-site repo and then one of us committers will review and merge.

[Legacy Jira: Jan Høydahl (@janhoy) on Oct 22 2019]

mikemccand commented 4 years ago

Hi, I was talking last week with Infra on Apachecon:

We should open a new INFRA issue for the one-time-write javadocs/refguide pages and its integration external to pelican.

[Legacy Jira: Uwe Schindler (@uschindler) on Oct 29 2019 [updated: Oct 30 2019]]

mikemccand commented 4 years ago

Thanks Uwe for checking. So I guess we'll start with the plain CMS and then worry about the javadoc and refguide sub tasks once Infra documents how that will work.

[Legacy Jira: Jan Høydahl (@janhoy) on Oct 29 2019]

mikemccand commented 4 years ago

SVN history imported into new branch, thanks @adamwalz. One sub task closed, five to go :) 

[Legacy Jira: Jan Høydahl (@janhoy) on Oct 30 2019]

mikemccand commented 4 years ago

Milestone reached - ASF builedbot builds and publishes the new website git repo to https://lucene.staged.apache.org !

[Legacy Jira: Jan Høydahl (@janhoy) on Nov 14 2019]

mikemccand commented 4 years ago

Awesome work! @janhoy I found there are some simple mistakes :D

1) Resources links in https://lucene.staged.apache.org/core/ is wrong. (right side of the page) https://lucene.staged.apache.org/discussion.html => https://lucene.staged.apache.org/core/discussion.html https://lucene.staged.apache.org/developer.html => https://lucene.staged.apache.org/core/developer.html https://lucene.staged.apache.org/features.html => https://lucene.staged.apache.org/core/features.html But https://lucene.staged.apache.org/core/features.html is not found. https://lucene.staged.apache.org/downloads.html => https://lucene.staged.apache.org/core/downloads.html

2) In mailing list, there is an unchanged content. As you know, our Slack page is #lucene-dev now. It was changed a week ago and I changed the web page an hour ago. https://lucene.apache.org/core/discussion.html#slack https://lucene.apache.org/solr/community.html#slack Channel name #lucene-solr -> #lucene-dev

[Legacy Jira: Namgyu Kim (@danmuzi) on Nov 14 2019]

mikemccand commented 4 years ago

Thanks @danmuzi, there are some known issues. I still need to go through each page with a fine-toothed comb to ensure parity with production. This process will be easier now that the site is on staging rather than building locally only. I'll go through these mistakes this weekend. 

 

I've been trying to port changes in from the svn site, but haven't ported anything in the last week which is why the slack channel is unchanged. I'll fix that.

[Legacy Jira: Adam Walz on Nov 14 2019]

mikemccand commented 4 years ago

With LUCENE-9015 done we now also have a production branch (source) and a asf-site branch (generated). See more in README.

[Legacy Jira: Jan Høydahl (@janhoy) on Nov 14 2019]

mikemccand commented 4 years ago

@adamwalz  See https://github.com/apache/lucene-site/pull/7 for some files I suggest to remove. And https://github.com/apache/lucene-site/pull/8 for a convenience script for installing pelican, building and serving site.

A general question: Should we have Apache License headers on all our MD files? I think so...

[Legacy Jira: Jan Høydahl (@janhoy) on Nov 15 2019]

mikemccand commented 4 years ago

Commented on the PRs.

As for Apache License headers, I'm thinking of adding yaml front matter to all markdown files. The yaml will allow for more elaborate header settings - for instance multiline markdown in variables. I was going to use this for the solr security page by having variables for CVE, severity, versions affected, description, and mitigation. That way in jinja we can target each variable separately and format as a table rather than only having access to the markdown content.

 

It will look something like this with the yaml front matter in `````

title: XML Bomb in Apache Solr versions prior to 5.0 CVE: CVE-2019-12401 severity: Medium versions_affected: | 1.3.0 to 1.4.1 3.1.0 to 3.6.2 4.0.0 to 4.10.4 mitigation: |

Solr versions prior to 5.0.0 are vulnerable to an XML resource consumption attack (a.k.a. Lol Bomb) via it’s update handler. By leveraging XML DOCTYPE and ENTITY type elements, the attacker can create a pattern that will expand when the server parses the XML causing OOMs


 

Using front matter will also make it possible to include a license in each markdown file without affecting rendering.

[Legacy Jira: Adam Walz on [Nov 15 2019](https://issues.apache.org/jira/browse/LUCENE-8987?focusedCommentId=16974728&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16974728)]
mikemccand commented 4 years ago

I created LUCENE-9057 for fixing various issues until the staging is on par with current prod. Use it to track the fixing of issues and possibly license headers.

Perhaps we should focus on getting an equal site out there and decommission CMS site before adding new features like security RSS etc?

[Legacy Jira: Jan Høydahl (@janhoy) on Nov 21 2019]

mikemccand commented 4 years ago

@adamwalz sorry for my silence. Thanks for you super work so far. We are still blocked on IFRA for LUCENE-9014 but hoping to solve it soon. What do you need from us to complete LUCENE-9057 and thus get the site in a state that can be pushed to production replacing the old site? I think we should keep it as simple as possible for the first release, closely replicating existing site and not adding more features for now.

[Legacy Jira: Jan Høydahl (@janhoy) on Dec 09 2019]

mikemccand commented 4 years ago

Hi, we can proceed. I will work on the weekend about refguide and javadocs if the setup also works with staging.

[Legacy Jira: Uwe Schindler (@uschindler) on Jan 03 2020]

mikemccand commented 4 years ago

Check https://lucene.staged.apache.org/index.html for the new staging site which is as of today in sync with current CMS production.

We are very close now to publishing the new site. Please review and report more issues, broken links or stuff we should change before making the switch.

Also nice if you can help with the remaining sub tasks here, updating docs, WIKI etc.

[Legacy Jira: Jan Høydahl (@janhoy) on Jan 31 2020]

mikemccand commented 4 years ago

Cool!  With the site in Markdown & Git, I'm far more likely to actually touch it.

[Legacy Jira: David Smiley (@dsmiley) on Feb 01 2020]

mikemccand commented 4 years ago

@danmuzi  I just pushed a fix for the core/features.html that you reported above - it was missing. I think we have fixed all your comments now. Really grateful for your review - let us know if you find other bugs in the new site before we push it to production.

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 12 2020]

mikemccand commented 4 years ago

I pushed a change to the site but buildbot failed to build the site, see https://ci2.apache.org/#/builders/3/builds/366/steps/2/logs/stdio

Don't know why this suddenly happens now and not before. I flagged it on INFRA slack, hope they look into it. The really bad thing is that they publish the site even if the Pelican build failed - leaving a non-working staging website :( 

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 12 2020]

mikemccand commented 4 years ago

Ok, I tried to disable the plugin md_inline_extension (https://github.com/apache/lucene-site/commit/26bf54c2e14c6d134cebe3faa74d965eff31683d) and now the site builds. I diffed output folder with and without extension and no difference, so I don't think we rely on it for anything. @adamwalz  do you know why it is there?

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 12 2020]

mikemccand commented 4 years ago

Fixed broken links to core system requirements (missing core/ in URL path)

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 13 2020]

mikemccand commented 4 years ago

The new site is now live:

To publish Javadocs and Refguide nothing has changed, only extpath.txt is gone. Just commit to subversion as you did before.

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 16 2020]

mikemccand commented 4 years ago

Small issue: https://lucene.apache.org/solr/resources.html#documentation The link to the Solr Javadocs is missing the version number!

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 16 2020]

mikemccand commented 4 years ago

It looks like this does not work in markdown files:

./content/pages/solr/resources.md:* [Latest Release](/solr/{{ LUCENE_LATEST_RELEASE | replace(".", "_") }}/index.html)

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 16 2020]

mikemccand commented 4 years ago

Seems like JINJA is not expanded within MD at all. Unfortunately the solution may be to move all content from solr/resources.md into templates/solr/resources.html and just keep the header of the MD file. Unless there is a way to enable Jinja expansion in MD with some plugin?

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 16 2020]

mikemccand commented 4 years ago

I would also like to get JINJA executed for the .htaccess file (to automate the redirects).

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 16 2020]

mikemccand commented 4 years ago

How about that: https://github.com/getpelican/pelican-plugins/tree/master/jinja2content

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 16 2020]

mikemccand commented 4 years ago

Hm, it's activated already!

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 16 2020]

mikemccand commented 4 years ago

At least to fix this bug we can use the "/api" redirect. The htaccess already has a redirect in place!: https://lucene.apache.org/solr/api

Nevertheless I'd really like to get the ".htaccess" file automatically have the version numbers expanded!

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 16 2020]

mikemccand commented 4 years ago

I figured out why it does not work: For jinja2content there are some restictions: "In this approach, your content is first rendered by the Jinja template engine. The result is then passed to the normal pelican reader as usual. There are two consequences for usage. First, this means the Pelican context and jinja variables usually visible to your article or page template are not available at rendering time. Second, it means that if any of your input content could be parsed as Jinja directives, they will be rendered as such. This is unlikely to happen accidentally, but it's good to be aware of."

So it looks like vraibales from pelicanconf are not available at this time! I am looking around.

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 16 2020]

mikemccand commented 4 years ago

@janhoy: I found a very hackish way to fix this problem. I am delaying the link generation to the templating phase. The trick was to just add a placeholder in the content HTML and then in the template do a string replace of the content. Not sure if this is the best idea, but anybody who telle me a better way: Come up!


 content/pages/solr/resources.md             | 2 +-
 themes/lucene/templates/solr/resources.html | 4 ++++
 2 files changed, 5 insertions+, 1 deletion−

diff --git a/content/pages/solr/resources.md b/content/pages/solr/resources.md
index f2e9b89e2..1685ade99 100644
--- a/content/pages/solr/resources.md
+++ b/content/pages/solr/resources.md
@@ -49,7 +49,7 @@ Beginning with Solr 4.4, a detailed reference guide is available online.
 Solr generates JavaDocs for all included code in each release.
 Copies of this documentation for every release can be found online:

-* [Latest Release](/solr/{{ LUCENE_LATEST_RELEASE | replace(".", "_") }}/index.html)
+* [Latest Release](__DOCSLINK__)

 <h3 class="offset" id="additional-documentation">Additional Documentation</h3>

diff --git a/themes/lucene/templates/solr/resources.html b/themes/lucene/templates/solr/resources.html
index aef2de26a..5e9f5fca3 100644
--- a/themes/lucene/templates/solr/resources.html
+++ b/themes/lucene/templates/solr/resources.html
@@ -9,3 +9,7 @@
 <dd><a data-scroll href="#presentations">Presentations</a></dd>
 <dd><a data-scroll href="#videos">Videos</a></dd>
 {% endblock %}
+
+{% block content_inner %}
+{{ page.content | replace('__DOCSLINK__', '/solr/%s/index.html' % LUCENE_LATEST_RELEASE | replace(".", "_")) }}
+{% endblock content_inner %}
{
```}

[Legacy Jira: Uwe Schindler (@uschindler) on [Feb 16 2020](https://issues.apache.org/jira/browse/LUCENE-8987?focusedCommentId=17037878&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17037878)]
mikemccand commented 4 years ago

I reverted this and used the redirect as described above.

As a second step I was able to also template the htaccess. It's no longer needed to hack version snumbers into it. Redirects to latest version work automatically. The trick is to put the htaccess file into the theme as a generic template. In the config, this is added as a separate template which is always processed (without any content file) and saved into a fixed filename: https://docs.getpelican.com/en/stable/settings.html#template-pages

Works fine, will commit now.

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 16 2020]

mikemccand commented 4 years ago

Here is how it's done: https://github.com/apache/lucene-site/commit/4af12c3b4320f84d5cfa7a645b9fdf991292e7e8

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 16 2020]

mikemccand commented 4 years ago

I did a dead link crawl of our entire site (except all Javadoc and old ref guides). The result is in LUCENE-9229. Guess some of that stuff is caused by the move - we are missing some static gifs and what not, but I cannot tell if those were present in the old site.

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 17 2020]

mikemccand commented 4 years ago

Hi @janhoy :D

Thanks for reflecting my comment. I checked again and there are some mismatched links.

1. Download links in TLP News (https://lucene.staged.apache.org/news.html)

I think the link should change to [[https://lucene.apache.org/core/downloads.html]] and [[https://lucene.apache.org/solr/downloads.html]] => [[https://lucene.staged.apache.org/core/downloads.html]] and [[https://lucene.staged.apache.org/solr/downloads.html]]

2. Inconsistency for the Solr guide page

Currently, the guide page link in FEATURES and RESOURCES and DOWNLOAD are different.

1) FEATURES All 'SOLR REF GUIDE' icons point to https://lucene.apache.org/solr/guide/** ex) In https://lucene.staged.apache.org/solr/features.html#query -> https://lucene.apache.org/solr/guide/searching.html

2) RESOURCES 'Solr Quick Start' and 'HTML Version of the Reference Guide' point to https://lucene.staged.apache.org/solr/guide/**

3) DOWNLOAD 'Reference Guide chapter “Upgrade Notes”' and "System Requirements" at the end point to https://lucene.apache.org/solr/guide/**

3. Javadoc page for Lucene and Solr

There seems to be an intentional "lucene.staged.apache.org" now, but there are a few mismatched pages.

1) Lucene Javadocs in core Release Docs -> https://lucene.staged.apache.org/core/8_4_1/index.html -> 404 Not Found

2) Solr Javadocs in RESOURCES Latest Release -> https://lucene.staged.apache.org/solr/api/index.html -> 404 Not Found

3) Lucene Downloads Change log(8.4.1) -> https://lucene.apache.org/core/8_4_1/changes/Changes.html Change log(7.7.2) -> https://lucene.apache.org/core/7_7_2/changes/Changes.html -> Inconsistency with 1)

4) Solr Downloads Change log(8.4.1) -> https://lucene.apache.org/solr/8_4_1/changes/Changes.html Change log(7.7.2) -> https://lucene.apache.org/solr/7_7_2/changes/Changes.html -> Inconsistency with 2)

[Legacy Jira: Namgyu Kim (@danmuzi) on Feb 17 2020]

mikemccand commented 4 years ago

Thanks Namgyu. You are right that some places the links are hard coded. I filed https://github.com/apache/lucene-site/pull/12 to address some of these. Wrt ref-guide and javadoc, those are not (yet?) available on staging site, so when we switch to "correct" URL for those, there will be 404. That's what you are seeing in 1) and 2). @uschindler will we have these mapped to staging area too?

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 17 2020]

mikemccand commented 4 years ago

I don't think. It's like before, this stuff won't make it to staging. The server is much too limited. And we only checked out some testing stuff to verify that it works. Infra does not want to spend 25GB on the staging server for those duplicate docs.

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 17 2020]

mikemccand commented 4 years ago

IMHO the hardcoded links in news should be fully relative. No need for domain names.

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 17 2020]

mikemccand commented 4 years ago

IMHO the hardcoded links in news should be fully relative. No need for domain names. In markdown we write <https://lucene.apache.org/foo/bar> which will automatically be rendered as a link. We can't do the same with relative links, and we cannot easily expand SITEURL either?

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 18 2020]

mikemccand commented 4 years ago

In markdown we write https://lucene.apache.org/foo/bar which will automatically be rendered as a link. We can't do the same with relative links, and we cannot easily expand {{ SITEURL }} either?

You can still create relative links in markdown with the standard link syntax (square brackets). I think the reason why it's used here is to make writing the release e-mail easier (copypaste).

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 18 2020]

mikemccand commented 4 years ago

Can we remove the "lib" folder from our Pelican build It looks like this is a relic from previous ASF CMS? It only contains two PERL pm files, so definitely not useful with Pelican.

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 18 2020]

mikemccand commented 4 years ago

So you suggest something like this?

Download from:

     [https://lucene.apache.org/foo.html](/foo.html)

Yea, please go ahead and clean up lib/ :)

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 18 2020]

mikemccand commented 4 years ago

So you suggest something like this?

This is somehow not really good, as the link shown to enduser is still hardcoded, so we can stay with the simple <> syntax.

There is one way to make absolute links display (also formatted as code) automatically, although the href is relative:

  $('a.autolink').each(function() {
    $(this).text( $(this).prop('href') ).addClass('code');
  });

This will reformat every link on page load with jQuery to just contain the href (which gets expanded by browser). All those links just need a special class: <a href="relative/path.html" class="autolink"></a>. Not sure if this is an idea, but I use it quite often for API documentation where you have relative links to API endpoints , but you want to show the full URL as link description.

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 18 2020]

mikemccand commented 4 years ago

Ok, I would not really bother too much right now. The historic release announcement news articles actually reflect what was in fact emailed out, so I don't see the big win in making them relative. If you write a new draft news and want it to link to some new page section in staged site, you can explicitly make that link relative of course.

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 18 2020]

mikemccand commented 4 years ago

I removed the lib folder in master.

Do you want to merge everything to production soon? Your changes from yesterday are not yet visible.

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 18 2020]

mikemccand commented 4 years ago

We can maybe change the base.html template to do some search/replace for news articles automatically. So we can search/replace http://lucene.apache.org and replace by https://lucene.apache.org. This would enforce HTTPS links for everything that's local.

Something like:

{% block content_inner %}
{{ page.content | replace('http://lucene.apache.org/', 'https://lucene.apache.org/') }}
{% endblock content_inner %}

[Legacy Jira: Uwe Schindler (@uschindler) on Feb 18 2020]

mikemccand commented 4 years ago

Yes we can merge to production soon.

I attempted a fix to the CSS caching issue. It is just a simple Pelican variable that gets injected for every unversioned CSS and JS in our HTML templates. See https://github.com/apache/lucene-site/pull/13 - Adding this should make the new front page load well for everyone after publishing :)

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 18 2020]

mikemccand commented 4 years ago

Thanks for checking my comment. @janhoy :)

I checked https://github.com/apache/lucene-site/pull/12 and it looks good.

[Legacy Jira: Namgyu Kim (@danmuzi) on Feb 18 2020]

mikemccand commented 4 years ago

Published the changes from staged:

[Legacy Jira: Jan Høydahl (@janhoy) on Feb 19 2020]