Repository Management #22

Closed pigworker closed 4 years ago

I'm now thinking about how to install Mary actively.

The plan is that people build sites on GitLab. The question is how to ensure that their changes to those sites get propagated.

For the moment, I'm expecting that there will be clones of all the sites living in my filespace (with a keep out apache configuration). The question is how to decide when to git pull. Options include

cron job (which is the Marx approach; Marx knows nothing about it)
make Mary git pull the repo every time it is accessed (might be slow)
make Mary git pull the repo if this hasn't happened in the last n minutes
allow an option in get data to demand a git pull (especially useful if you're the author; perhaps permitted only if you have authorisation)

At the moment, I'm minded to implement 4 first (so anybody can demand a pull), then try supplementing it with 3.

What I don't know is whether GitLab continuous integration can do anything useful for us.

Option 4 together with gitlab webhooks on push seems like the perfect solution to me. If we want updates to be demanded by authorised users only, we could use the X-Gitlab-Token header field for hook authentication.

Looks good!

So far, I've added

mary -web <pandoc-executable> <siteroot> <username> <pagename>

and it does the right thing when I'm logged in as me in a shell. When it's serving a page inside the container, the relevant lump of php uses my local copies of the mary and pandoc executables, but something is still going wrong.

https://personal.cis.strath.ac.uk/conor.mcbride/shib/Mary/?page=TestMaryRepo/README.md

ought to make the right thing happen. It's definitely running mary: disturbing the page to one which doesn't exist gives the appropriate error. I just don't know why servePage is not cooperating. I think it's got everything it needs inside the container, unless I'm missing something.

Can we generate better diagnostics if servePage goes wrong?

My attempt to make

https://personal.cis.strath.ac.uk/conor.mcbride/shib/Mary/?page=TestMaryRepo/README.md&pull

do a git pull in the right place isn't going well for me either. So hard to see what's going wrong.

I've invited some people to

https://gitlab.cis.strath.ac.uk/cgb08121/testmaryrepo/

aka TestMaryRepo.

Ping me if you feel left out.

Don't get excited. It doesn't work yet.

Did you git clone via git@ or https://? If mary runs at a lower privilege than you then it won't be able to use your ssh key to pull using git@ but a pull on https should work fine.

That could be one of the issues. I think the command I issue to do the pull isn't helping, either.

Ha! That plan doesn't work, because the repo is and must be private. If I go via https, I have to give a username and password!

OK, the current status is that the right thing happens when invoking mary -web from the terminal, but not on the web. I'm clearly getting containers wrong.

https://personal.cis.strath.ac.uk/conor.mcbride/shib/Mary/?page=TestMaryRepo/README.md&pull

Gets to the Hello! diagnostic, which is output just before calling servePage. I don't know why we get the output from my bash call. I don't know whether git is even a thing inside the container. Ha ha! I just made it output the result of which git and we're not winning.

I reckon it's time to ping Ian G.

I'm now in communication with Ian G. Hopefully git will materialise.

Meanwhile, I need to fix the pipe problem. At the moment, we're using System.Process to generate the pipe, but somehow that's failing when run inside the container. Perhaps we should revert to the way I did it before, making index.php generate the pipe. There are then four components:

mary find <siteroot> <page> <username> (stdin gives serialised POST and GET; stdout gives yaml metadata POST, GET, followed by markdown of page)
pandoc -s -f markdown -t JSON (stdin yaml and markdown; stdout JSON of pandoc data)
mary mangle (stdin JSON of pandoc; stdout JSON of pandoc, after running shonkier, etc)
pandoc -s -f JSON -t html (stdin JSON of pandoc; stdout html, with suitable template)

I wonder how long we'll continue to put up with all the marshalling, given that both mary and pandoc are written in Haskell...

OK, we now have git in my container. I'm still getting it wrong.

I don't know what's (not) going on.

I moved the git pull out into a script

cd ../MarySites/$1
echo "before"
git pull -v
echo "after"
cd ../../Mary

I can tell that

./mary -web ...

is running the script, because I get the before and after. However, the git pull is not have any effect. We have nontrivial output from which git.

Ah! Maybe it can't do ssh properly because my key is inaccessible.

Perhaps you could change the pull to

GIT_SSH_COMMAND="ssh -vvv" git pull -v

to get some ssh feedback?

OK. But no change on observed behaviour and no diagnostic output.

Looks like most of the diagnostic output from ssh -vvv goes to stderr, maybe what you want is

GIT_SSH_COMMAND="ssh -vvv" git pull -v 2>&1

?

Yes, and that confirms my theory.

This does not leave me much the wiser when it comes to solving the problem. I've made the entire MarySites directory inaccessible by apache configuration and put a copy of my key there, so I can do ssh -vvv -i ../id_rsa, but I can't understand the feedback as to why it is now failing.

Same url as above.

debug1: read_passphrase: can't open /dev/tty: No such device or address

Is your key passphrase protected?

According to some googling, that error message is probably read_passphrase failing to print the real error message to a terminal... Could it be that the containered git fails to compare the server host key with its known keys? (cf https://ubuntuforums.org/showthread.php?t=2248801)

I don't have a passphrase on my key.

Running the script as me from the terminal does suggest that it's looking up gitlab in ~/.ssh/known_hosts, which is inaccessible to the container. Not sure what to do about that.

What about copying your ~/.ssh/known_hosts to the container? (If it works we can trim it later.)

It's not at all obvious where to put it, or how containerised ssh is configured.

Seems like you can use -o UserKnownHostsFile=FILEPATH to point to a specific replacement for the usual ~/.ssh/known_hosts.

That's it! Thanks for your help. The git pull is now working.

Now I just have to get the sodding thing to serve a page. I think my pipe plan should work, but I don't want to do it until after we merge the optparse change. Is that good to go?

Is that good to go?

Yes it is! I did it as a PR as a courtesy, but happy to merge it in now.

Edit: I was happy and merged it in.

We have lift off.

https://personal.cis.strath.ac.uk/conor.mcbride/shib/Mary/?page=TestMaryRepo/README.md&pull

just successfully pulled and rendered the README file of the test repo, which had just been updated with code to compute an inline span.

This is now working as intended. I had to do several things to get that to work, apart from not making silly mistakes involving too many or too few uses of ./:

pandoc was looking for data files outside of the container; I had to duplicate those files inside the container
with empty POST data, I was generating empty yaml objects in the metadata, which pandoc did not appreciate

Things can start to materialise now!

Before we close this, do we want to try the gitlab CI webhook thing?

Before we close this, do we want to try the gitlab CI webhook thing?

I just tried the most direct thing, but now I suspect that the shibboleth authentication is going to be our next fun thing to fight with...

Indeed...

The easy way out would be to move the pull functionality into a separate php file outside the shib directory. It could do its own authentication if need be, but I don't see any danger in letting anyone trigger a pull, really.

Yes, that is exactly the right thing to do. And it is done! I think I'll close this now!

msp-strath / Mary

Repository Management #22