Migrate away from knowledge-repo

harterrt commented 6 years ago

This repository depends on knowledge-repo to render reports. Knowledge-repo includes some nice features like comments and some discoverability tools, but these features come at the cost of complexity.

In particular, the analyst loses control over the final presentation of their report. Docere works around this issue. I recommend migrating this repository to work with docere. We can then autodeploy the static site to s3, github pages, or something similar.

When we started using knowledge-repo our goal was to be able to review jupyter analyses without having to review the json ipynb files. We can achieve this by using nbconvert to produce html and md files when adding notebooks to the repository. If we decide to go down this route I'll be happy to add this functionality to docere's cli.

In theory this conversion should be simple, but I just tried it and hit some roadblocks:

The orig_source jupyter notebooks no longer contain the output used to generate the knowlege.md file. This means we can't use nb-convert to render these notebooks.
Using pandoc to render the .md files gives us a raw HTML file without any formatting. I tried some simple formatting with skeleton.css, but it will need more work.
The frontmatter should be removed from the markdown files and moved to a report.json file

harterrt commented 6 years ago

CC: @Dexterp37 @mreid-moz @rafrombrc, who were on the previous issue thread

Dexterp37 commented 6 years ago

@chutten who might be interested as well!

jklukas commented 5 years ago

I'm working on this on the back burner. I'll plan to have a PR by end of week with at least the content converted. May take a bit longer for getting deployment working as I haven't looked into that yet.

AFAICT so far, the orig_source notebooks seem to have what we need and nbconvert is working happily on them.

harterrt commented 5 years ago

I'm hazy on this, but IIRC the orig_source notebooks don't include any of the output from the evaluation :(. Maybe just convert the markdown documents to HTML instead?

jklukas commented 5 years ago

I'm now seeing notebooks that indeed don't have the output included in the ipynb. Images are also a problem.

So I can see that rendering from md is probably a better bet as @harterrt mentioned.

jklukas commented 5 years ago

I was hoping that kr would have some option for rendering statically, but it appears not.

My other thought is to scrape html from the existing http://reports.telemetry.mozilla.org and host the /static assets along with the content we push to S3 or wherever.

jklukas commented 5 years ago

I think the web scraping + static assets will work.

Here's a prototype:

https://jklukas.github.io/mozilla-reports/post/

That's html pulled via curl, processed with BeautifulSoup to remove navigation chrome, then indexed with docere.

@harterrt - Let me know if the above looks good to you. It preserves urls from the existing site (assuming we end up making the new thing available at mozilla-reports.tmo). There are some missing files due to processing errors, and I'd make sure to get those working before making a final PR here.

harterrt commented 5 years ago

Thanks, @jklukas!

It looks like we're picking up some extraneous bits of UI from knowledge-repo. For example, see the "heart" functionality, tags, and metadata section shown at the top of this report (screenshot). The images appear to be unscaled as well which means the images are often larger than my screen.

These aren't critical issues, especially if we can copy the raw markdown documents and static assets to the new repository as well. That way we can fix any render issues in future PRs.

It would be great if we could avoid these issues on the first pass though. It looks like the reports are rendering well in github's preview. Could we use something like grip or Pandoc to render the markdown files directly?

Like I said, these aren't critical issues so let's not block on them. Thanks for working on this!

jklukas commented 5 years ago

Looks like some of the css, etc. was not actually accessible like I thought they were. I just published an update and this should look significantly close to publishable:

https://jklukas.github.io/mozilla-reports/post/

It also removes some of the extraneous bits you noticed.

Some additional questions for @harterrt:

Is the intention here that we host the new static site at reports.telemetry.mozilla.org? Or do you want this to live alongside kr for some period of time? Trying to sketch out deploy strategy.
Do you know where current reports.telemetry.mozilla.org lives? I'd like to scrape the /static content from there if possible.
Are you aware that there are some reports that aren't rendering on the existing site? http://reports.telemetry.mozilla.org/post/e10s_analyses/beta/51/week6.kp and the other weekX posts are giving timeouts.

harterrt commented 5 years ago

Looks good. I'll take a closer look today. Responses to your questions:

The goal is to replace RTMO entirely. There will be a period where they both are live, but I don't expect it to be long.
Here's the repository that handles deployment of RTMO: https://github.com/mozilla-services/cloudops-deployment/tree/master/projects/data/puppet/modules/rtmo
I was not aware that those reports were not rendering. I'll have to take a closer look.

jklukas commented 5 years ago

We have a CircleCI pipeline running now to upload content to reports-dev and add a docere index (thanks @haroldwoo!) I'm working now to fix some dangling issues.

Work items still in progress at this point:

Get a new version of docere published with authors support
Verify all static content is working
Figure out CSP rules to allow MathJax and Google Fonts
Catalog the documents from RTMO that aren't currently working and we weren't able to import to docere-based site

Before making this live on RTMO, we'll need to update the README to remove all the KR stuff and give instructions for how DS are expected to author reports. That's likely something for @harterrt to own?

harterrt commented 5 years ago

We have a CircleCI pipeline running now to upload content to reports-dev and add a docere index (thanks @haroldwoo!) I'm working now to fix some dangling issues.

There's already CI populating reports-dev. We use mozilla-private-reports since it has access restrictions. Unfortunately, it looks like the new CI is overwriting the existing index produced from MPR. We should remove this CI quickly so the reports hosted at reports-dev are discoverable.

We should be able to close this issue out by migrating the static reports to MPR.

harterrt commented 5 years ago

It's starting to feel like we're swimming upstream with this migration. Let's take a step back to see if there's a better solution.

In particular, it looks like this public repo is used more often than I expected (see https://github.com/mozilla/mozilla-reports/pull/91#issuecomment-433404385). It would be good to keep these reports public, both for continuity and to have an avenue to share reports outside of Mozilla. I'll still have to figure out what to do with our confidential reports. Maybe deploy to private.tmo? Maybe just deploy to hala1?

It's still valuable to migrate away from knowledge repo and towards docere. Maybe we can just use the new CircleCI deploy to write to reports.tmo instead of reports-dev.

jklukas commented 5 years ago

It seems tenable to have:

mozilla-reports repo that deploys to reports.tmo
mozilla-private-reports repo that deploys to private-reports.tmo

And we 'd have them set up basically identically and document that they should follow the same deploy patterns, etc.

jklukas commented 5 years ago

There is some discussion going on in bugs about names for new domains where we'll host this. Once that's settled and we have domains, we can continue. Note that there's at least one instance of an update to a notebook since then (https://github.com/mozilla/mozilla-reports/pull/92) so we'll need to make sure to do another round of generating static content from KR before making the final switch here.

mozilla / mozilla-reports

Migrate away from knowledge-repo #81