rsyslog / rsyslog-doc

documentation for the rsyslog project
Other
99 stars 249 forks source link

Request: Setup experimental branch that automatically builds/displays on main rsyslog.com site #516

Open deoren opened 6 years ago

deoren commented 6 years ago

Short version

Please setup an experimental branch and a corresponding build script on the main site to allow submitting PRs against that branch which build within a scheduled short window of time.

Details / Prior discussion

From #493:

@rgerhards: do you think it would make sense to have a "doc-experimental" or whatever branch that we would frequently render on the rsyslog site? So we would see how it looks there as well. But it's a bit clumsy...

@deoren: Provided that automated builds from that branch are enabled, it would make it easier/safer for contributors to see how the docs would render within the confines of WordPress integration. The way it is now you have to ask your team to upload/decompress tarballs on an on-demand basis and that manual effort is bound to lag behind changes to the branch. If you did go this route it could be useful to help test new themes and major changes.

As I test larger changes, having access to view the docs within the restrictions of the WordPress integration would really come in handy. I could submit PRs here against that branch and merge when ready to test the changes. If all goes well the same changes could be submitted against master for real use.

Future

In addition to testing how well changes to doc contents would integrate with the main site, I will continue to build/test locally and on my personal VPS. That said, at some point I hope to work with the team to setup a build box for use by regular contributors so that all active branches for those team members could be built and demoed for review by others. I have a few ideas in mind, but I want to let them cook and test further before demoing anything.

deoren commented 6 years ago

@alorbach Referencing this issue in an email response.

rgerhards commented 6 years ago

That said, at some point I hope to work with the team to setup a build box for use by regular contributors so that all active branches for those team members could be built and demoed for review by others.

I like that idea. We should discuss this further when @alorbach is back.

deoren commented 6 years ago

@deoren: That said, at some point I hope to work with the team to setup a build box for use by regular contributors so that all active branches for those team members could be built and demoed for review by others.

@rgerhards: I like that idea. We should discuss this further when @alorbach is back.

Thanks. I've already begun preliminary work in that direction.

To make automated builds not take forever to complete, we'll need to record completed build ids (based off of the last commit for that branch) to reduce branch builds to just the branches that have changed. I plan to use a SQLite db to record that info. For simplicity we could record the repo forks in a flat-file here and the cron script would read that list and proceed to build all branches for those forks.

I hope to have this going in the next week or so, all depending on other commitments.

rgerhards commented 6 years ago

I would prefer flat files, as it is easier to manage without need to digging into a different data store. Maybe it's also time for a container. I was about to setup one for the current cron script -- just so that I have a consistent environment for the cron run as well as local testing. Of course, it should then also be used under CI, so we get the same thing everywhere. What do you think?

deoren commented 6 years ago

@rgerhards: I would prefer flat files, as it is easier to manage without need to digging into a different data store.

I see the flat-file for contributor fork listings residing in the repo , but the SQLite db used for temporary use by the build script(s) and not directly by users.

If the db is blown away then the script would fall back to assuming it needed to build everything. If the db is available, the build script would use the current recorded state to determine what to build. This is all based on the assumption that it takes a good deal of time to build many branches and that by eliminating builds for unchanged branch content the frequency of those builds could be increased. This would allow for a quicker feedback loop for changes we're testing.

I'm currently building all branches for my fork every 15 minutes, but the process of generating HTML and epub formats is taking close to 10 minutes to complete. Because it's a VPS, other guests on the box (as well as other services in the current guest) could impact the performance of my VPS and cause that build time to increase. If the builds were conditional, the number of branch builds needed should greatly decrease and the frequency of builds could increase (e.g., every 5-10 minutes instead of every 15 minutes).

The goal with an increased build frequency would be to reduce the reliance on local builds of the docs to permit editing content entirely through the web UI (if someone chose to do that) or just a text editor and Git (no need to setup a local build env for testing).

@rgerhards: Maybe it's also time for a container. I was about to setup one for the current cron script -- just so that I have a consistent environment for the cron run as well as local testing.

Perhaps. It wouldn't hurt either way.

@rgerhards: Of course, it should then also be used under CI, so we get the same thing everywhere. What do you think?

CI is being provided now for doc commits on PRs, so having CI applied to all branch commits would extend coverage and possibly provide quicker feedback on syntax errors. I've used local builds thus far for that. Having the build box provide reports re build states makes sense.

Note:

The cron script in its current state is usable, but I've yet to supply all files needed by the cron script for immediate use:

One goal will be to shift the settings for the cron script to an external file and rework it so that it doesn't need an external wrapper script, but I'm not there yet. I've been making small quick changes thus far and haven't spent a lot of time refactoring it for general use yet.

The current cron script in master works from a local clone of the repo and uses detached head states in an attempt to prevent touching the branch content accidentally. The version I'm prototyping relies on a local mirror of the repo instead and creates a fresh clone for each run. This version of the script does not use detached head states. I'm not sure yet which is the best approach, but the primary goal is to keep from accumulating local branches (and building them) that no longer exist at the remote source.

On a related note, I have worked on the release_builds.sh script and the last I looked it was ready for your use from a scratch copy of the repo.

That script is meant to be run standalone as needed, but it makes several assumptions. I tried to make those clearer in the latest modifications to the script so that they're displayed upon execution of the script.

rgerhards commented 6 years ago

FYI: I have filed a PR for an initial container: https://github.com/rsyslog/rsyslog-doc/pull/523

I plan to extend it so that I can also use it for web site related things.

rgerhards commented 6 years ago

back to the automated build, and let's step a bit back to the requirements. What exactly would you like to see? I think along these lines:

  1. doc change is proposed
  2. PR is pushed

Do you now want to have a build exactly for this PR? That's a pretty complex beast...

rgerhards commented 6 years ago

To make automated builds not take forever to complete, we'll need to record completed build ids (based off of the last commit for that branch) to reduce branch builds to just the branches that have changed. I plan to use a SQLite db to record that info. For simplicity we could record the repo forks in a flat-file here and the cron script would read that list and proceed to build all branches for those forks.

I think I just now really understand that comment, but if you are really after a per-PR build, we need to integrate with the CI system. CI already handles all that complexity for us and provides us a fresh rsyslog-doc repo with just the PR in question. So no need at all to track anything.

The real problem, however, is to where to display the results and how long. Especially as this takes up quite a bit of space. I doubt it is useful to have it available only for one day. This is the complexity I meant...

rgerhards commented 6 years ago

mmhhhh... right now, it's 20MiB per build/format. Let's say we set aside 1TiB, then we could cache roughly 50,000 CI runs -- that's actually not too bad ;-) So I think that would work if we set a retention period of 7 days and have the cleanup run also do a premature cleanup if the size "goes too large".

This shuffles the problem to securely transfering the data from the (untrusted) CI system to the (trusted) rsyslog site. That's a hard question (really this time), as someone could use it to induce some (malicious) code to the main site. I guess that rules out any automated way to transfer it to the main site, as this is an inherant risk that comes with CI. Let me think how we could address that...

@deoren Our main objective is to have the new code running under the same style as the site -- right? Anything else?

In any case, I think it does not make sense to invest time in scripting to do some caching for PR runs. We need to do CI, and CI needs to rebuild from scratch in any case - else it's not a good CI. CI needs to go from scratch to final result and must exercise the whole process in doing so.

rgerhards commented 6 years ago

I had a chat with @friedl: maybe I am taking the wrong approach. As he pointed out, the doc is already just rendered under a relatively small part of the site framework. Maybe we should try to re-create a sphinx theme that matches the site theme and then we can go ahead and deploy on a separate virtual server (like doc.rsyslog.com) the build just "as-is". That would probably completely remove the need for any auto-generation as everyone could just use the local build result inside the browser - no complexity, no security issues...

... that boils down to finding out how hard it is to create a sphinx theme that matches what we need for the site.

deoren commented 6 years ago

@rgerhards

I've done a poor job of keeping the topics separate, for that I apologize. I'll first post an overview of my current thinking along with some responses related to the PR you just merged re the docker content. I'll then respond directly to some of your prior posts to this thread.

Overall goals

My (current) primary goals for the cron script

  1. Run on a schedule
  2. Build all branches for all forks in a list
  3. Repeat

Cron script design considerations

Prevent branch rebuilds when nothing has changed

When the script runs it should check the status for each branch to determine whether the latest commit has already successfully built. If so, that branch is skipped and the next one is evaluated. Right now, the script is still pretty basic and just rebuilds everything. This is mostly OK for a single fork with a small number of branches, but quickly turns into a lengthy build time for a small VPS that has to process a dozen forks, many of which were created as "quick" one-off prototypes.

Bandwidth and remote server load (fair use aspect)

To save on bandwidth and remote server load (the "fair use" bit that you mentioned on the linked PR), I would have a set of local mirrors for each fork that would be synced on script run and cloned for the build operations for that fork.

Prevent unintentional merges/changes to branches during build process

Cloning from local mirrors is somewhat of a defensive measure since we both encountered a bug in the cron script at one point that resulted in one branch being merged over the top of another. A fresh clone (hopefully) helps minimize the damage from such a mistake (particularly if the automated build process runs for many builds without being caught). The upside of using local mirrors is that when they're synced against their remote all local changes that have been "pushed" to them are lost as the local mirror is synced 1:1 to the status of the remote repo.

Anyway, regardless whether the script uses local mirrors or a local clone + many remotes attached to it, the goal is to reset the status of whatever clone is being used for the build so unintentional changes are not built.

CI-like, per-commit builds (not a goal)

I'm not personally interested in having a CI-like effect where each commit is built separately (as you noted Travis CI already does that), but rather the latest commit on the branch at the time the script is run. Ideally the script would run often enough so that the available built content matches recent changes to the docs. My goal is to have the built content available to demo prototype changes so others can review and comment on them before they become part of the master branch for eventual use by the community. Having my VPS perform that job while working out the changes to the module doc format was really useful, as was the frequency of the builds.

Experimental branch

The experimental branch is intended to test potential upcoming changes against the current site to see how well the docs merge with the site theme. The original concept was that anyone from the doc team could use it and the results would be built and pushed to a separate location on the web server. For example, if the current branches are available as:

then the experimental might be available as /doc/experimental/ or something similar. It doesn't have to be on the same box, it could be on a clone of your current VPS to help alleviate security concerns.

deoren commented 6 years ago

@rgerhards: The real problem, however, is to where to display the results and how long. Especially as this takes up quite a bit of space. I doubt it is useful to have it available only for one day. This is the complexity I meant...

In the case of the cron script (and really the branch builds in general, no matter how they're built), the intention is to have the build reflect the latest version on the branch. When a branch is pruned, the build for that branch would eventually expire. With the setup I'm currently using with my VPS the "controller" script handles purging stale content.

I'm using these steps right now to prune older content:

find "${OUTPUT_BUILD_DIR}" -type f -mmin +30 -exec rm {} \; >> ${BUILD_LOG_FILE} 2>&1
find "${OUTPUT_BUILD_DIR}" -type d -empty -delete >> ${BUILD_LOG_FILE} 2>&1

Since rebuilds run every 15 minutes, it doesn't take too long for stale content to disappear.

rgerhards commented 6 years ago

Thanks for the well-written post. I still think we can achieve the desired results in an easier ways. I would like to approach this by concentrating on some smaller aspects, so that we can have a quick exchange. Also it saves writing a lot - a lot that potentially might not be needed based on the answer of the quick topic. Hope this is OK with you.

Let's go to the goals:

Continuous Integration (Travis is handling this)

small correction: Travis is currently handling it, but that's really just the case because we did not need more. On the main project, the heavy hauling is done by buildbot: http://build.rsyslog.com:8010/waterfall?tag=rsyslog-rsyslog -- we can, and should, also use it for doc if it makes sense (and it does!).

Frequent branch builds for demo purposes among the team and community for feedback prior to completing the work (which may be lengthy) and merging to master Experimental branch builds to test against the production website WordPress integration

These two are essentially the same for me like the first. A PR is something that is experimental. So whenever you want to have some feedback on a feature, you ideally do a PR. Not every PR needs to get merged, and it is not a failure if it is not. So IMO a PR is the basic unit of discussion when moving towards a new feature, style etc. Same goes for experimental builds against the web site - it is just current imperfection that we notice it "after the fact". Much better would be during the process itself.

Let's first see if we can reach agreement here, all the rest depends on that.

deoren commented 6 years ago

@rgerhards: This shuffles the problem to securely transfering the data from the (untrusted) CI system to the (trusted) rsyslog site.

Two separate goals (in my mind):

So I wouldn't have a CI system handle the builds/deployment, but a box of your own. I would have two boxes actually (or one box with two Docker instances running, I dunno).

I personally plan to host the cron script on my end for a while until the results are solid enough to deploy the script on a system used by everyone on the team (direct membership, occasional contributors, etc). Once the design solidifies and we're ready to officially host regular builds we could run it officially, but I think the script (and process) needs additional bake time.

rgerhards commented 6 years ago

just one quickly in addition to the last message:

Build all branches on every fork

On the official repro, we have v?-stable and master. We should never have more (some for strange reasons, some leftovers I think may exist at the moment). Everything floating as idea is a PR - and will become master once merged.

deoren commented 6 years ago

@rgerhards I'm slow this morning, still working on my responses to your earlier responses. I'll catch up with you in a moment. ;)

rgerhards commented 6 years ago

maybe we should try and see if we can find some more interactive tool...

deoren commented 6 years ago

We could just limit ourselves to 140 characters ;)

rgerhards commented 6 years ago

DM'ed you on twitter, but maybe it is better to keep it here as a record of decision making -- all has pros and cons...

rgerhards commented 6 years ago

Just a bit of history from the main project: we started with an "experimental branch" and were very happy to go to PR-driven dev only: http://blog.gerhards.net/2016/04/improvements-in-ci-environment-and.html

deoren commented 6 years ago

@rgerhards: @deoren Our main objective is to have the new code running under the same style as the site -- right? Anything else?

Two objectives (at least for me):

The latter might not seem that important, but if we wish to see the results both before and after integration this is important to judge where the failure in integration might lie. That and if we wish to test other styles, it's easier as standalone output.

@rgerhards: In any case, I think it does not make sense to invest time in scripting to do some caching for PR runs. We need to do CI, and CI needs to rebuild from scratch in any case - else it's not a good CI. CI needs to go from scratch to final result and must exercise the whole process in doing so.

You are probably right. I took some time last night (it's 9 am for me right now) with just pen and paper and worked through some different scenarios. This was helpful for me as I was able to better mentally split out the different goals and brainstorm how I might approach them. I decided I would focus on caching as a later step that could be optionally used, or not, as need be. I'm instead going to focus on the SQLite DB use to record build state. At least for my own purposes that will help with faster rebuilds for frequent branch changes and as a side effect it will also reduce remote load as well.

deoren commented 6 years ago

@rgerhards DM'ed you on twitter, but maybe it is better to keep it here as a record of decision making -- all has pros and cons...

Definitely a good idea to keep decisions here as it makes it easy to backtrack and figure out specifics. I've done that many times with discussions that we've had re direction of changes.

rgerhards commented 6 years ago

ACK on staying here.

Two objectives (at least for me):

Test integration with the main site Test results as stand-alone output

Nothing prevents us from doing this in CI.

rgerhards commented 6 years ago

let me get you an example -- I need to induce a static analyzer error in rsyslog, then I can show you related functionality on the main project. Might take a little bit...

deoren commented 6 years ago

@rgerhards: ACK on staying here.

@deoren: Two objectives (at least for me):

Test integration with the main site Test results as stand-alone output

Nothing prevents us from doing this in CI.

Perhaps the real issue is my lack of experience with CI and the options that are available. I'm still used to old school methods where you setup a test box and demo your changes there before merging into a production environment. I'll probably continue work on my end for a bit, but it sounds like it's probably best to hold off on the experimental branch idea for now as you've got better tools/methodologies to rely on.

rgerhards commented 6 years ago

think of CI as this:

You just need to be even a bit more security aware as "the world at large" can push files to your CI system (no manual review).

rgerhards commented 6 years ago

I'm still used to old school methods where you setup a test box and demo your changes there before merging into a production environment.

That test box is the CI system box.

deoren commented 6 years ago

@rgerhards I think I understand the potential of CI and I actually can really appreciate how useful it is for checking results, but I'm not yet aware of how it can build content and then push that content to a staging area for review. For rsyslog software builds, you really want to see where the code may fail, but for content that is meant to be browsed and reviewed by a human, the success or failure of a build is still important, but just as much so is the result of seeing how the content reads and fits within the theme of the site.

If Travis (or other CI provided by outside resources) is able to build and then push the latest results to a location for common review, that is probably a much better approach than any cron script or build process that I could ever put together on my own, or anyone else short of having a true programmer skillset (which I do not have).

As you mentioned there is the security aspect; having someone push a change to the source/conf.py file would allow Python code to run with the rights of the account performing the build and potentially cause all sorts of harm. That said, the original concept I had (build all branches for all forks) would be for a trusted circle of contributors only and the risk would be minimized as a result.

If you based it purely on PR submissions, well that's a completely different design and while doable, would require some more thought.

rgerhards commented 6 years ago

OK, here we go. I injected an error into rsyslog code that triggers the static analyzer. On travis, we just have a small compiler notice. On Buildbot, we have more control over what happens. So have a look at this buildbot slave (this is the only one that does static analysis on buildbot):

http://build.rsyslog.com:8010/waterfall?show=rsyslog%20ubuntu16%20rsyslog

Notice the failure at the top. The position might change as new PRs come in, so let's get down to the specific build:

http://build.rsyslog.com:8010/builders/rsyslog%20ubuntu16%20rsyslog/builds/879

There, notice stdout, which has the same compiler warning Travis has. But it also has a "report URL" link. Follow it it, and you'll find a gateway page that I could not yet get rid of:

http://build.rsyslog.com:8010/builders/rsyslog%20ubuntu16%20rsyslog/builds/879/steps/shell_4/logs/report_url

From there, follow the provided URL, and then follow the link to the subdirectory. This will bring you here:

http://ubuntu16.rsyslog.com/18-01-26_15-10-29/2018-01-26-151117-14071-1/

That is actually the HTML generated by the static analyzer. We provide it because it is much more useful than the error message. IIRC is remains for 7 days on the system and will then be cleaned-out.

We can do exactly the same with HTML generated by doc PRs (again, requires some work, but not hard).

The description here reads long, but it really is a three-click-to-html approach. All fully automated inside CI.

deoren commented 6 years ago

Ah, I see what you mean. That is powerful and likely does what I was thinking of so much better. Using a scheduled build approach seems like it is from the dark ages in comparison.

That is definitely food for thought.

rgerhards commented 6 years ago

I can setup a "build-outside-of-main-site" demo on buildbot. It's actually not that complicated. But I would wait for @alorbach to do the basic tying of github&buildbot as he is much quicker with that than I am. Once this is done (early next week I guess), it's probably around an hour of work to get the HTML output as part of the CI run.

The security concerns are mostly in regard to publishing unknown HTML to a site where users expect valid HTML (www.rsyslog.com). With the CI system URLs, this should be much less a problem. I hope that we can get a sphinx theme that sufficiently matches the site's theme, and then we can publish the official doc outside of the main site (doc.rsyslog.com?) and, if so, than we can also trial it via CI system URLs.

rgerhards commented 6 years ago

and forgotten (grrr...): every CI system needs to deal with injected code inside PRs, so that's not a new risk here. He site HTML injection is. Just to spell it out clearly ;-)