jimporter / mike

Manage multiple versions of your MkDocs-powered documentation via Git
BSD 3-Clause "New" or "Revised" License
528 stars 47 forks source link

Offer hooks or plugin interface for non git deployment #131

Closed couling closed 1 year ago

couling commented 1 year ago

I've come here from a very similar angle to #31 . Mike is recommended by material. The feature of having every released version of the documentation available became our primary reason for adopting mike.

Yet working with deployment direct to git doesn't work so well in some contexts outside GitHub pages.

Besides the point raised in #31 theres some real complexity in keeping the actual deployment in sync with the git branch. AFAIK there's no easy way to apply a git patch to an S3 bucket (static website). A full sync breaks the principle of leaving deployed versions untouched and has gotyas that mean it's not totally reliable.

Honestly I'm wondering if Mike is really the right tool for us. But I keep coming back to the fact it it has a really neatly packaged mechanism for writing alias forwarding html files and versions.json. It offers a slick command line interface for managing deployed versions and aliases.

Nobody wants to reinvent the wheel.


I wonder if you would consider opening up some kind of plugin interface or hooks that could be used to deploy somewhere other than a git branch.

I realise Mike was imagined as a git only tool.


Functionally Mike fetches some files from a git commit, invokes mkdocs, generates some simple files, and commits back. (Maybe I missed something)

The kind of hook I'm imagining is one which would allow a plugin to swap in different behaviour for fetching and writing / pushing files.

A plugin developer could then implement, for example, an S3 client leveraging Mike to maintain files in an S3 bucket instead of files in a git branch

jimporter commented 1 year ago

Besides the point raised in #31 theres some real complexity in keeping the actual deployment in sync with the git branch. AFAIK there's no easy way to apply a git patch to an S3 bucket (static website). A full sync breaks the principle of leaving deployed versions untouched and has gotyas that mean it's not totally reliable.

I'm not sure how your deploy process is set up, but I'd expect that you could just rsync the contents of your gh-pages branch to S3 and it would only update the modified files.

Honestly I'm wondering if Mike is really the right tool for us.

Ultimately, it's not.

But I keep coming back to the fact it it has a really neatly packaged mechanism for writing alias forwarding html files and versions.json.

If you have a fancier server at your disposal, you probably shouldn't use the HTML redirects in the first place. Something like an Apache redirect would be much nicer, but Github pages don't support that. Hence the HTML redirect hack. (mike v2 will use symlinks by default for this, which is a fair bit nicer on Github, but could run into its own issues on other kinds of servers.) I'm not familiar with the details of hosting static sites on S3, but it appears that they have some fairly rich rules for redirects. An "S3-ified" mike should probably use that for aliases instead.

As for the versions.json file, I consider that to be a (loose) specification that other projects can adopt if they like. From the MkDocs Material theme's point of view, it's not really supporting mike, but rather "the mike versioning index specification" (i.e. versions.json). With a bit of work, it could use some totally-different tool, so long as it gets a compatible versions.json file.

I wonder if you would consider opening up some kind of plugin interface or hooks that could be used to deploy somewhere other than a git branch.

Sorry, no. I really don't think mike can be easily adapted in a way that would make it a sensible solution for non-Git deployments. With a different kind of server, you'd want to take advantage of its unique features/deployment process, so most of the code to do so would be unrelated to what mike has. Of course, maybe small bits from the mike source tree could be used (for example, the code for versions.json), but I think it would be better to just copy that small fraction of code and use it in a new project.

Fundamentally, mike is just mkdocs gh-deploy on steroids. Without the "deploy to Git" part, there's not much left.

Functionally Mike fetches some files from a git commit, invokes mkdocs, generates some simple files, and commits back. (Maybe I missed something)

That's not quite true; mike just builds docs for whatever your current checkout is, and then commits those built docs to a Git branch. So it only writes to Git, never reads from it (ok, technically it reads from Git for things like mike list and mike serve, but those are mainly for debugging).

couling commented 1 year ago

Okay I respect that, thanks for your time.

The reason for not rsync and similar is that Mike knows which files have changed, it writes them. Other things have to scan and compare. That's slow and there are reasons it can fail to sync correctly as it's usually based on size and file time.

Ultimately I was trying to doge writing my own from scratch. But I guess it's not such a huge piece of code.

jimporter commented 1 year ago

If you do write something like that (and open source it), I'd be happy to provide a link to it from mike. mike was never intended to be the universal solution for this sort of thing, but a set of broadly-compatible tools that work in different situations would certainly be good for the "mike ecosystem".

jimporter commented 1 year ago

The reason for not rsync and similar is that Mike knows which files have changed, it writes them. Other things have to scan and compare. That's slow and there are reasons it can fail to sync correctly as it's usually based on size and file time.

You could get the list of changed files from Git: git diff-tree --no-commit-id --name-only -r HEAD^ (assuming HEAD is the last commit on gh-pages; if not, tweak that command as needed). Then it's just a matter of feeding those file names into your deploy script.

couling commented 1 year ago

You could get the list of changed files from Git: git diff-tree --no-commit-id --name-only -r HEAD^

It's more like this: assuming we store the last deploy commit id somewhere and know it isb7be3afad600530196e3d90dc82733b7e5ad8693 then it would be something more like, find the gh-pages commit id with git show-ref refs/heads/gh-pages then

# Don't checkout, that can mess up the CI tool
rm -rf staging_directory
git clone --branch afd883f4988546967a064cfb0e31fb60d6ba3a0f . staging_directory

git log --pretty=oneline b7be3afad600530196e3d90dc82733b7e5ad8693..afd883f4988546967a064cfb0e31fb60d6ba3a0f | while read commit other
do
  git diff-tree --no-commit-id --name-only -r $commit
done | sort -u | while read file
do
  if [ -f "$file" ]
  then
    # Push "staging_directory/$file" to remote_url/$file
  else
    # Delete remote_url/$file
  fi
done
# Push afd883f4988546967a064cfb0e31fb60d6ba3a0f to a file so we know what we updated to.

For targets other than S3 this would also involve a sanity check of replacing a directory with a file or visa versa and the obvious need to make directories on the remote.

I'm not sure I've caught every edge case.

jimporter commented 1 year ago

Yeah, from a certain point of view, you could look at this problem as follows: mike puts the authoritative state of your documentation site in gh-pages, and then you just need to figure out the best way to sync those changes with the server. Since Git has lots of plumbing to let you inspect changes, this shouldn't be too hard. It really depends on the specifics of your deploy pipeline.

However, this strategy would mean that you're not getting any of the extra niceties (e.g. fancy redirection rules) that S3 static sites could provide.