vemonet / setup-spark

:octocat:✨ Setup Apache Spark in GitHub Action workflows
https://github.com/marketplace/actions/setup-apache-spark
MIT License
20 stars 12 forks source link

Action does not fail if file cannot be found #11

Closed ionicsolutions closed 1 year ago

ionicsolutions commented 2 years ago

Describe the bug When you specify a combination of versions that does not exist, the Action does not fail:

image

The URL https://archive.apache.org/dist/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.3.tgz leads to a 404, the correct URL is https://archive.apache.org/dist/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz.

I made this mistake because https://spark.apache.org/downloads.html advertises 3.2.1 as "with Hadoop 3.3 and later".

In any case, the action should fail if Spark cannot be installed instead of silently progressing.

Which version of the action are you using?

v1

Environment GitHub-hosted

Spark Versions Please list all of the effected versions of Spark (3.1.2, etc.)

To Reproduce

      - uses: vemonet/setup-spark@v1
        with:
          spark-version: 3.2.1
          hadoop-version: 3.3

Run/Repo Url https://github.com/anovos/anovos/runs/5326819596?check_suite_focus=true

Screenshots

image

Additional context

vemonet commented 2 years ago

Hi @ionicsolutions , thanks for the clear report. Indeed the action should fail!

The reason why it does not fail: when I built the action I was a bit lazy, and went for the fastest and most reliable solutions I knew: wget ! (it's by default in all linux runners afaik, so why not using it?)

The faulty code is here: https://github.com/vemonet/setup-spark/blob/main/src/setup-spark.ts#L25

The problem is that if the version/file does not exist it does not trigger an error in the JS

2 solutions:

If anyone has an suggestion? I am not sure if using wget from the JS actions is recommended, but from my point of view it just work for downloading files, and it's the fastest out there (not sure if fetch will work out of the box with everything and be as fast). And it seems like speed this is what people like about this action: https://github.com/vemonet/setup-spark/issues/8 (not sure if fetch will be as fast)

I don't have much time to look into it right now, but it should not be a really hard change to make if someone want to contribute! Otherwise I'll look into it some day when I will have time for it (or if more people are coming to complain!)

ionicsolutions commented 2 years ago

I believe that https://stackoverflow.com/a/43077917/7469434 would be the solution here, I can give it a try.

vemonet commented 2 years ago

Hi @ionicsolutions

Thanks for the suggestion! I went for something similar and added an extra file check to make sure the spark binaries are properly downloaded

I also used the opportunity to upgrade some dependencies, and node12 to node16

I updated the v1 release, let me know if it does not work as expected!

ionicsolutions commented 2 years ago

Hi @vemonet, thanks for taking care of this.

I updated the v1 release, let me know if it does not work as expected!

What exactly do you mean by "updating the v1 release"? I believe you have to (and should) create a new release that incorporates the fix. Happy to report back on any issues that might surface!

vemonet commented 2 years ago

What exactly do you mean by "updating the v1 release"? I believe you have to (and should) create a new release that incorporates the fix.

Relevant remark @ionicsolutions , I agree with you, and that's usually what I do for every project I work on, but for GitHub actions I am a bit confused because GitHub does not seems to recommend using incremental semantic-like versioning

Here I didn't make breaking change to the API, how the action is used, or how it works (it is working the same as before, but better, less chances of failing silently, which was also noted as a problem in other issues). So usually I would have created a release like 1.0.1 or 1.1.0

But:

In my opinion it seems to be a design choice, GitHub Actions are not complex softwares, they are just wrappers to run complex softwares in a specific environment. So it is better to go with a simple versioning system (occam's razor principle, only make a system as complex as you need it to be!)

If I am fixing some vulnerability-introducing bugs in the action, then it's better if most users of the action can get the update without additional work from there side. In this case my best option is be to update the v1 release, so everyone using the action will use a sane version the next time it runs in their workflow.

Otherwise the setup-spark users would need to be notified about the fact that I released an update that is fixing some vulnerabilities, but I don't think there is such mechanisms for GitHub Actions, and it is impossible for humans to manually check all the GitHub Actions they use regularly to see if there is an update they need to make to some of them. Also to be honest as a developer I would rather not need to care about the versioning of some github actions I use for testing, if some companies want to use them in critical workloads then they should do their due diligence for a secure supply chain: fork the action and review every new change before running in production

So from my point of view it seems safer, more developer friendly, less stressful for everyone (the action maintainer and the action users) to just update the existing version v1 as long as GitHub Action does not support proper dynamic versioning that enables minor patches to be automatically used

But that's just a personal observation that I made from using GitHub Actions, I am not sure if there is any docs, or has been any discussion about those concerns. I would be interested if you have any interesting pointers or better solutions for this!

ionicsolutions commented 2 years ago

Hi @vemonet, I understand your reasoning but personally find it dangerous to silently make changes to scripts or tools others depend on.

For example, it seems that the update you made breaks the Action on MacOS, not sure what exactly is going wrong: https://github.com/anovos/anovos/runs/6356850929?check_suite_focus=true If I didn't know that there was a change made to the Action, it would be difficult for me to debug. (It might also be that Apache's servers are flaky, will investigate and file an issue.)

GitHub itself uses semantic versioning for Actions, e.g., https://github.com/actions/cache and with Dependabot it's also not too difficult to keep track of Actions updates. But I fully get your points!

vemonet commented 2 years ago

Thanks for your pointers! Indeed actions/cache uses semantic versioned releases, but you still can uses: actions/cache@v3, which is exactly what I was looking for. I'll check their setup, and will uses proper versioning next time I need to do an upgrade :)

For the MacOS bug, I did not tried macos runner, I will add a test for this

I just tried a fresh MacOS installation (on a MacBook Pro) and it seems like wget is not installed on MacOS by default (maybe it was silently failing before? And the added try/catch allow to now detect this problem easily)

So maybe switching for curl -fsSL instead of wget will fix it

vemonet commented 2 years ago

@ionicsolutions Testing with MacOS and curl: https://github.com/vemonet/setup-spark/actions/runs/2306295979

MacOS seems to work in general, just one random fail for spark 3.0.3 (even if previous and later versions all work fine)

Then testing MacOS with wget instead of curl: https://github.com/vemonet/setup-spark/actions/runs/2306331177

The action consistently takes 1 min to complete usually, but 2 jobs have been stuck on Run setup-spark for 15min now: 3.0.3 (the same that failed in the curl test) and also 3.1.2 now

Note that it is (randomly) failing on those commands: wget && tar && rm && ln -s (which are all fully working for every other versions of Spark)

So it's probably related to global instability of the MacOS runner (that's why I did not tried to test for MacOS in the first place... I don't see it as an OS built for doing computing, it's more of an nice interface supporting Unix specs, and it costs a lot)

ionicsolutions commented 2 years ago

Thanks a lot for looking into this! My client mostly tests on Mac OS because many of their devs are using it, so we're not overly concerned with fixing it, as nobody will run actual workloads under Mac OS.

For now, I'll just mark the tests as experimental and continue to observe whether there is a pattern in the failures.