Which hosted CI engine to use? #10

Closed civier closed 3 years ago

Hi,

I've just watched the ARDC session on CI/CD, and the recommendation of one of the speakers was:

GitLab is the best hosted CI provider, as it has simple config files

Runner-ups are:

Travis: was good and simple, but now very slow
GitHub actions + Microsoft zero pipelines: use the same infrastructure which is very complex to set up

Any thoughts? @stebo85 why did you recommend on GitHub actions?

Best, Oren

P.S. Slides from the talks are here: https://sites.google.com/ardc.edu.au/techtalk2020/talks (talk 6)

Dear @civier,

I don't agree with the judgement of the speaker. To me Github actions are very easy and nice to configure and run and since we run everything on Github it makes a lot of sense to use this. However, if there is a strong opinion to use a different CI/CD system I don't mind to try that - I would leave this decision to the person who is doing the actual work ;)

Cheers Steffen

🤔 I would stick with Github Actions since everything is already on github and only if we have problems then migrate

agree :) The current conversion of the lemon dataset would already be a great test case to implement? Have you tried?

This is an example for the workflow yml file you need: https://github.com/QSMxT/QSMxT/blob/master/.github/workflows/test_qsm_pipeline.yml

and this then call the test script: https://github.com/QSMxT/QSMxT/blob/master/tests/run_test_qsm.sh

Now that I have finally organized what I got from the brainhack (the sovabids-bidscoin example + a bunch of issues inside bidscoin) I can continue with the original test and CI ideas.

When I have something good enough Ill tell you.

Ok so I think I've got the github workflow running along with pytest and codecoverage.

I see that the idea is to do a bids conversion as a test, the idea being using the LEMON dataset as a possible test-dataset . The problem with this dataset is that it is quite heavy (each eeg being like 300mb+) so maybe for the automated test we should look for something more lighter.

So my plan is to:

[ ] Look for a light dataset for automated testing
[ ] Do a pytest test based on this dataset, assert the correct BIDS info is done

I also understand the idea is to make tests to the CLI tool, which are done through the bash scripts (although I think this could also be done with pytest 🤔 )

@stebo85 Given the bash script example (/run_test_qsm.sh) I have the following question:

How are the assertions done in it?

Although I use bash scripts from time to time I don't really excel at it, my intuition is that the && echo "[DEBUG]. Test OK." || exit 1 notation is like testing if the command runs and if it does not it exits with 1.

Nevertheless this is sort of like testing it "doesnt crash" . Is it also at some point comparing the real output to an expected output? If yes, where is this functionality in the script?

Anyhow I will configure the entry points for the CLI. I did not know how to do that but from the bidscoin repo I kind of got the idea how to do it.

Dear @yjmantilla,

you can do the assertions and tests in python. The bash example I provided is just an example how you can test if a tool produced certain outputs.

the line: [ -f /tmp/02_qsm_output/qsm_final/_run_run-1/sub-170705134431std1312211075243167001_ses-1_acq-qsmPH00_run-1_phase_scaled_qsm-filled_000_average.nii ] && echo "[DEBUG]. Test OK." || exit 1

checks if a file is there that should have been produced by the tool - but this can also be done in python and using pytest.

Cheers Steffen

Thanks @stebo85 I will use pytest then.

I also decided to make a dummy dataset for the CI rather than downloading a dataset (main argument is that eeg data is quite heavy to do the test everytime we change something).

So I made this function:

https://github.com/yjmantilla/sovabids/blob/aca5938f031b0fa493550898314691ad9b5d2018/sovabids/utils.py#L205-L261

It creates a low foot-print dataset in vhdr format following a given tree structure. The default values give a dataset around 45mb. Another advantage is that we don't depend on external links for testing.

Another possible advantage is that I can customize more stuff and assert upon it the testing.

Do you agree with this approach for the CI? Or should I try to find a real dataset for the CI?

Dear @yjmantilla - that's perfect :) A dummy dataset will allow you to test quickly. Make sure to add features to the dummy dataset once a user reported a bug that you hadn't covered yet and with this your application will become more reliable over time :)

@stebo85 @aswinnarayanan

I was able to implement a CI test both using the bids_validator from python and the web version of the bids_validator (which appears to be the best validator as far as i know).

Here are the related files:

Making the conversion and the python test (only tests the file paths):

https://github.com/yjmantilla/sovabids/blob/main/tests/test_bids.py

Executing the web bids-validator given the previous conversion:

https://github.com/yjmantilla/sovabids/blob/main/tests/test_web_validator.sh

Here is how it looks on the Github Actions panel:

That's great :) Looks perfect to me!

closing this, it seems to be resolved

yjmantilla / sovabids

Which hosted CI engine to use? #10