msp-strath / Mary

Mary is the successor of Marx, a content delivery and assessment engine based on markdown and git
17 stars 1 forks source link

Repository Management #22

Closed pigworker closed 4 years ago

pigworker commented 4 years ago

I'm now thinking about how to install Mary actively.

The plan is that people build sites on GitLab. The question is how to ensure that their changes to those sites get propagated.

For the moment, I'm expecting that there will be clones of all the sites living in my filespace (with a keep out apache configuration). The question is how to decide when to git pull. Options include

  1. cron job (which is the Marx approach; Marx knows nothing about it)
  2. make Mary git pull the repo every time it is accessed (might be slow)
  3. make Mary git pull the repo if this hasn't happened in the last n minutes
  4. allow an option in get data to demand a git pull (especially useful if you're the author; perhaps permitted only if you have authorisation)

At the moment, I'm minded to implement 4 first (so anybody can demand a pull), then try supplementing it with 3.

What I don't know is whether GitLab continuous integration can do anything useful for us.

fredrikNordvallForsberg commented 4 years ago

Option 4 together with gitlab webhooks on push seems like the perfect solution to me. If we want updates to be demanded by authorised users only, we could use the X-Gitlab-Token header field for hook authentication.

pigworker commented 4 years ago

Looks good!

pigworker commented 4 years ago

So far, I've added

mary -web <pandoc-executable> <siteroot> <username> <pagename>

and it does the right thing when I'm logged in as me in a shell. When it's serving a page inside the container, the relevant lump of php uses my local copies of the mary and pandoc executables, but something is still going wrong.

https://personal.cis.strath.ac.uk/conor.mcbride/shib/Mary/?page=TestMaryRepo/README.md

ought to make the right thing happen. It's definitely running mary: disturbing the page to one which doesn't exist gives the appropriate error. I just don't know why servePage is not cooperating. I think it's got everything it needs inside the container, unless I'm missing something.

Can we generate better diagnostics if servePage goes wrong?

pigworker commented 4 years ago

My attempt to make

https://personal.cis.strath.ac.uk/conor.mcbride/shib/Mary/?page=TestMaryRepo/README.md&pull

do a git pull in the right place isn't going well for me either. So hard to see what's going wrong.

pigworker commented 4 years ago

I've invited some people to

https://gitlab.cis.strath.ac.uk/cgb08121/testmaryrepo/

aka TestMaryRepo.

Ping me if you feel left out.

Don't get excited. It doesn't work yet.

gallais commented 4 years ago

Did you git clone via git@ or https://? If mary runs at a lower privilege than you then it won't be able to use your ssh key to pull using git@ but a pull on https should work fine.

pigworker commented 4 years ago

That could be one of the issues. I think the command I issue to do the pull isn't helping, either.

pigworker commented 4 years ago

Ha! That plan doesn't work, because the repo is and must be private. If I go via https, I have to give a username and password!

pigworker commented 4 years ago

OK, the current status is that the right thing happens when invoking mary -web from the terminal, but not on the web. I'm clearly getting containers wrong.

https://personal.cis.strath.ac.uk/conor.mcbride/shib/Mary/?page=TestMaryRepo/README.md&pull

Gets to the Hello! diagnostic, which is output just before calling servePage. I don't know why we get the output from my bash call. I don't know whether git is even a thing inside the container. Ha ha! I just made it output the result of which git and we're not winning.

I reckon it's time to ping Ian G.

pigworker commented 4 years ago

I'm now in communication with Ian G. Hopefully git will materialise.

Meanwhile, I need to fix the pipe problem. At the moment, we're using System.Process to generate the pipe, but somehow that's failing when run inside the container. Perhaps we should revert to the way I did it before, making index.php generate the pipe. There are then four components:

  1. mary find <siteroot> <page> <username> (stdin gives serialised POST and GET; stdout gives yaml metadata POST, GET, followed by markdown of page)
  2. pandoc -s -f markdown -t JSON (stdin yaml and markdown; stdout JSON of pandoc data)
  3. mary mangle (stdin JSON of pandoc; stdout JSON of pandoc, after running shonkier, etc)
  4. pandoc -s -f JSON -t html (stdin JSON of pandoc; stdout html, with suitable template)

I wonder how long we'll continue to put up with all the marshalling, given that both mary and pandoc are written in Haskell...

pigworker commented 4 years ago

OK, we now have git in my container. I'm still getting it wrong.

pigworker commented 4 years ago

I don't know what's (not) going on.

I moved the git pull out into a script

cd ../MarySites/$1
echo "before"
git pull -v
echo "after"
cd ../../Mary

I can tell that

./mary -web ...

is running the script, because I get the before and after. However, the git pull is not have any effect. We have nontrivial output from which git.

Ah! Maybe it can't do ssh properly because my key is inaccessible.

fredrikNordvallForsberg commented 4 years ago

Perhaps you could change the pull to

GIT_SSH_COMMAND="ssh -vvv" git pull -v

to get some ssh feedback?

pigworker commented 4 years ago

OK. But no change on observed behaviour and no diagnostic output.

fredrikNordvallForsberg commented 4 years ago

Looks like most of the diagnostic output from ssh -vvv goes to stderr, maybe what you want is

GIT_SSH_COMMAND="ssh -vvv" git pull -v 2>&1

?

pigworker commented 4 years ago

Yes, and that confirms my theory.

pigworker commented 4 years ago

This does not leave me much the wiser when it comes to solving the problem. I've made the entire MarySites directory inaccessible by apache configuration and put a copy of my key there, so I can do ssh -vvv -i ../id_rsa, but I can't understand the feedback as to why it is now failing.

Same url as above.

gallais commented 4 years ago

debug1: read_passphrase: can't open /dev/tty: No such device or address

Is your key passphrase protected?

fredrikNordvallForsberg commented 4 years ago

According to some googling, that error message is probably read_passphrase failing to print the real error message to a terminal... Could it be that the containered git fails to compare the server host key with its known keys? (cf https://ubuntuforums.org/showthread.php?t=2248801)

pigworker commented 4 years ago

I don't have a passphrase on my key.

Running the script as me from the terminal does suggest that it's looking up gitlab in ~/.ssh/known_hosts, which is inaccessible to the container. Not sure what to do about that.

fredrikNordvallForsberg commented 4 years ago

What about copying your ~/.ssh/known_hosts to the container? (If it works we can trim it later.)

pigworker commented 4 years ago

It's not at all obvious where to put it, or how containerised ssh is configured.

gallais commented 4 years ago

Seems like you can use -o UserKnownHostsFile=FILEPATH to point to a specific replacement for the usual ~/.ssh/known_hosts.

pigworker commented 4 years ago

That's it! Thanks for your help. The git pull is now working.

Now I just have to get the sodding thing to serve a page. I think my pipe plan should work, but I don't want to do it until after we merge the optparse change. Is that good to go?

fredrikNordvallForsberg commented 4 years ago

Is that good to go?

Yes it is! I did it as a PR as a courtesy, but happy to merge it in now.

Edit: I was happy and merged it in.

pigworker commented 4 years ago

We have lift off.

https://personal.cis.strath.ac.uk/conor.mcbride/shib/Mary/?page=TestMaryRepo/README.md&pull

just successfully pulled and rendered the README file of the test repo, which had just been updated with code to compute an inline span.

This is now working as intended. I had to do several things to get that to work, apart from not making silly mistakes involving too many or too few uses of ./:

  1. pandoc was looking for data files outside of the container; I had to duplicate those files inside the container

  2. with empty POST data, I was generating empty yaml objects in the metadata, which pandoc did not appreciate

Things can start to materialise now!

pigworker commented 4 years ago

Before we close this, do we want to try the gitlab CI webhook thing?

fredrikNordvallForsberg commented 4 years ago

Before we close this, do we want to try the gitlab CI webhook thing?

I just tried the most direct thing, but now I suspect that the shibboleth authentication is going to be our next fun thing to fight with...

pigworker commented 4 years ago

Indeed...

fredrikNordvallForsberg commented 4 years ago

The easy way out would be to move the pull functionality into a separate php file outside the shib directory. It could do its own authentication if need be, but I don't see any danger in letting anyone trigger a pull, really.

pigworker commented 4 years ago

Yes, that is exactly the right thing to do. And it is done! I think I'll close this now!