rayluo / github-pages-overwriter

A Github Action that overwrites your Github Pages branch with the content of current workdir, thus deploy/publish without polluting your repo history.
MIT License
37 stars 2 forks source link

The logic is quite bizarre. #7

Closed im7mortal closed 1 week ago

im7mortal commented 1 month ago

The action is pretty dangerous. It's not immediately clear that it will erase all data.

ChatGPT provided me with a snippet for your action some time back. I briefly looked inside and had the opinion that it would just overwrite the last commit data. Instead, it removed absolutely all commits from my gh-pages branch.

It was my home project, so it wasn't critical. I was in a state where I had made huge changes, and the build directory contained pretty nonsense files. I already didn't remember how to build the old version (which was exactly the reason why I wanted to automate a new version). So, for some days, my gh-pages site wasn't available until I figured out how to build the new version of my site.

I think by default it should overwrite the last commit. It should have an obvious option like HARD_RESET_GH_PAGES for the current logic.

In my opinion, -f should not be used in automation at all. It's too dangerous.

rayluo commented 1 month ago

I can understand your perspective.

Yet this project is named "Github Page Overwriter", not "Github Page's last commit Overwriter". The workflow here is meant to have your site building logic defined elsewhere (i.e. outside of the github page), and then this action deploys your site into github page (by overwriting it). That was described in the "how does it work" section in the README.

Rather than closing this issue as "won't fix", I can convert this issue into a Q&A so that it is always visible to other audience for full transparency. Does that sound reasonable to you?

im7mortal commented 1 month ago

Thank you for quick response !

Yet this project is named "Github Page Overwriter", not "Github Page's last commit Overwriter".

Exactly! 😄 Everybody understand overwrite differently. It's not clear from the ".. Overwriter" that it will erase entire branch (if gh pages configured to main it can erase entire repo!) (and yes there is still bug present which can remove branch completely).

I have a vision, that

  1. First time 'github-pages-overwriter' runs, it commit on top of target branch. The commit has a string tag in it.

  2. Any following time 'github-pages-overwriter' runs, it overwrite only commit with the tag.

    If users want reset the history of the branch. They do it manually. It's their responsibility.

    Again

  3. An user has following structure. They have C++ project in main and some simple HTML in gh-pages

                            G1 --... --G67   gh-pages
                             /
           M0---M2 --...--M45--... --M184  main
  1. The user thinks that they need to add automation and consolidate their code in one branch. They have locally only main. And do changes. They use 'github-pages-overwriter' because it overwrites (whatever they thinks it means)

Suddenly they have following

                                   G1  gh-pages
                                /
              M0---M2 --... --M190  main

All history is lost.

rayluo commented 1 month ago

I am more than open to somehow enhance the documentation to emphasize that the Github Page Overwriter (GPO) will overwrite the entire gh-pages branch.

I have a vision, that ...

The vision you have, @im7mortal , might be doable, but perhaps not worth the effort. And that goes back to the position of GPO. GPO is NOT a website builder which yields a set of web pages, which in turn would be committed on top of an existing gh-pages branch. No, that's not how GPO works.

GPO means to be a deployment tool. It takes your website builder's output as GPO's input, and then GPO deploys them into a production system which "happens to" be a gh-pages branch. In DevOps context, you would have your project's history in git, and you would want your deployments to be disposable; if one deployment fails or becomes outdated, you simply create a newer deployment. In GPO, the gh-pages branch IS that deployment environment that will be overwritten all the time.

One more thing. Thanks for your earlier engagement in that #2 issue. Could #2 actually be a misunderstanding of GPO? With all the explanation above, will you still run into #2?

im7mortal commented 1 month ago

Hi Ray!

...GPO is NOT a website builder ...

Yes. I understand GPO purpose. I've never mentioned that GPO supposed to build site. GPO suppose to overwrite[^5] the content of the branch which is used with github pages.

... It takes your website builder's output as GPO's input, and then GPO deploys them into a production system which "happens to" be a gh-pages branch. In DevOps context, you would have your project's history in git, and you would want your deployments to be disposable ...

In DevOp context of production deployment, we want to be able perform rollback. The rollback is the industry standard for resolution of problems on production deployments. I agree that GPO is not designed to keep rollback option. The worse part is that GPO excludes the possibility to rollback from the first time it was run[^4].

My opinion that, GPO mustn't implicitly deprive people to perform rollback. It's what I call the bizarre logic.

Resolve the implicit

1. Documentation through configuration

Only possible way to document the current implementation is to add HARD_RESET_GH_PAGES required flag.

  1. When user use GPO first time, it print a message in the actions log explaining that it will erase target branch in git reset --hard manner and do exit 1[^1].
  2. If user is agree they set flag HARD_RESET_GH_PAGES=true

It should be version v2

2. Other vision (also v2); Principle of Least Privilege

The GPO reset only owned resources. In current implementation GPO owns target branch[^2].

In my vision, we need to satisfy Principle of Least Privilege. We reduce the scope to the commit level. GPO can perform operations only with commits it owns.

  1. The first run. GPO checks last commit on the target branch for GPO unique hash.
  2. If there are NO marked commit then GPO creates a commit[^3] with the mark of ownership.
  3. It there are marked commit. GPO owns this commit. It reset the commit and overwrites content. The commit has the mark of ownership.
  4. If user want reset their branch they do it explicitly themselves.

I have a draft implementation of this approach.

[^5]: As mentioned before, everyone interprets "overwrite" in their own way. [^1]: It will exit with 1 at the FIRST time. I think it's okay. The FIRST run should be debug run anyway. [^2]: The entire repository in worst case scenario if the target branch is main [^3]: The commit has content of target directory. It overwrites it. [^4]: It was my case with my home project. When the old stable versions were already erased and new build output was not functional.

rayluo commented 1 month ago

Thanks again for your effort and input on seeking potential solution, @im7mortal !

I think at this point, we both understand each other well. In particular,

In DevOp context of production deployment, we want to be able perform rollback. The rollback is the industry standard for resolution of problems on production deployments. I agree that GPO is not designed to keep rollback option.

... I also agree that rollback is the most common mitigation method in DevOps. It is just that GPO does not expect any valuable data inside GPS's "workspace" - which is the gh-pages branch by default. To use an analogy, consider GPO's target-branch as an "hello_world.exe" and consider your main branch as an "hello_world.cpp". When you find a bug in your latest "hello_world.exe", you don't normally expect your file system to revert "hello_world.exe" to its earlier version; you simply rollback your source code "hello_world.cpp" and then recompile another exe file.

With that, I think there can be a much simpler adjustment here. GPO v2 can keep its same behavior as v1, except that the default target branch will be changed from the familiar gh-pages to an exotic gpo-workspace branch, which is unlikely to pre-exist.

Problem solved?

im7mortal commented 1 month ago

Hi Ray!

When you find a bug in your latest "hello_world.exe", you don't normally expect your file system to revert "hello_world.exe" to its earlier version; you simply rollback your source code "hello_world.cpp" and then recompile another exe file.

My suggestions doesn't collide with this scenario at all.

The problem is the first try when user still not aware that GPO will reset their history.

GPO does not expect any valuable data inside GPS's "workspace" - which is the gh-pages branch by default.

gh-pages doesn't have any default usage pattern [^6]. It was introduced much much earlier then Github Actions(10 years earlier)[^7]. And even before web builders!!![^8] It's branch with static site for main project. User can have their project in main and have htmls, pictures, videos in separate branch. They can have documentation there. There are no convention or official Github guidance about how to use gh-pages branch. One can use Github repository only for github pages feature[^5].

Again , at some time user can decide to switch to automation. They can decide to develop a build system which will overwrite[^3] gh-pages content. They probably will give GPO a dev try[^4] and ... GPO will immediately reset all their history.

Implementations

With that, I think there can be a much simpler adjustment here. GPO v2 can keep its same behavior as v1, except that the default target branch will be changed from the familiar gh-pages to an exotic gpo-workspace branch, which is unlikely to pre-exist.

It can be confusing. User can fix the target page and erase the gh-pages branch at next run.

[^3]: As mentioned before, everyone interprets "overwrite" in their own way. [^4]: Ideally user should run it on other then gh-pages branch. But one can suggest that they do simple rollback if GPO will mess up [^5]: I do :smile:

[^6]: One could have cpp project and understand nothing in front end. They could create the simplest html with description of the project and host it on gh-pages [^7]:GitHub Pages was released on June 19, 2008. GitHub Actions, on the other hand, was announced at the GitHub Universe conference on October 17, 2018, and became generally available on November 13, 2019.

[^8]: Disposable web started around 2012 with Grunt as I can search online. But definitely not before NodeJS which was released in 2009

rayluo commented 1 month ago

GPO does not expect any valuable data inside GPS's "workspace" - which is the gh-pages branch by default.

gh-pages doesn't have any default usage pattern

Perhaps I can rephrase.

  1. Github Pages (not gh-pages) does not have any default usage pattern, but conventionally people tend to name their branch gh-pages and - as you convinced me - there could be some pre-existing content in that gh-pages branch.
  2. GPO was designed to have its own branch as a workspace. Currently that workspace defaults to gh-pages thus likely wipes out the old content in that gh-pages branch.
  3. My proposal is to change GPO's default workspace branch to a name (such as gpo-workspace) which is so rare that it is unlikely to pre-exist in anyone's repo.
im7mortal commented 1 month ago

I see your point more clearly now.

I'm concerned about two scenarios:

1. A branch N hosts the GitHub Pages code. It has a long history.

2A. A user notices that the default value is gpo-workspace. They change the target branch to N[^8] and lose their history.

2B. A user runs GPO and sees that changes were written to gpo-workspace. They change the target branch to N[^8] and lose their history.

Implicitness

What is the actual concern with the Least Privilege approach? GPO will not keep disposable files. If there are 1000 runs, GPO will keep only one commit with the latest version. The only difference is GPO will not touch resources it does not own. If a user wants to reset their branch before overriding[^3] its content, they have to reset the branch manually one time. They can even create an orphan branch and mark the initial commit as owned by GPO, and GPO will override this commit (and so the entire branch).

[^3]: As mentioned before, everyone interprets "overwrite" in their own way. [^8]: Because they want override they github pages branch

rayluo commented 1 month ago

I see your point more clearly now.

I'm concerned about two scenarios: A branch N ... has a long history. ... They change the [GPO] target branch to N and lose their history.

Thanks for your understanding. Now perhaps you can also better understand my earlier analogy which compares (a) GPO's converting your repo's any given branch into a gpo-workspace branch by default with (b) a compiler gcc's compiling hello_world.c into a.out by default. The compiler gcc's default behavior would not overwrite the source code hello_world.c but - the last time I used it - it does not stop a novice from doing gcc hello.c -o hello.c which loses the source code.

Trying to guard against that kind of use error (by changing GPO to somehow use a per-commit ownership model?) seems overly complicated. Heck, even git itself has a design philosophy that allows rewriting history even though it risks a user accidentally nuking their content.

It is better to have a simpler (thus more reliable) tool and use it right, than to have a more complicated tool and hope it works right.[^1] The alternative proposal seems too complicated to fit in my smaller brain :-) [^2]

[^1]: "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. ..." -- C.A.R. Hoare [^2]: "If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea." -- Zen of Python