noi-techpark / it.bz.opendatahub.databrowser

Explore and navigate through Open Data you need to build your next service.
https://databrowser.opendatahub.com
GNU Affero General Public License v3.0
9 stars 7 forks source link

Make reuse compliant #386

Closed henri-egger closed 1 year ago

henri-egger commented 1 year ago

We are currently working on making all projects reuse compliant. To achieve that all code and config files that can have comments recieve a header stating copyright and licensing information, all other files like assets or json files which cannot recieve comment headers are mentioned in the .reuse/dep5 file. We also decided that we should add the reuse compliance check to the CI. In addition, it would be possible to add a pre-commit hook to check for reuse compliance before comitting.

gappc commented 1 year ago

@henri-egger thank you for the changes, they seem to cause no problems, I merged them :+1:

Could you please also:

henri-egger commented 1 year ago

@gappc I already added the compliance check to the CI, regarding the pre-commit hook, to integrate it into .husky/pre-commit the user needs to have the reuse tool installed. I can add a check for that, my question is just what should happen when he doesn't have it installed. It could either prevent the user from comitting and give him installation instructions or just tell the user that reuse isn't installed and go on without checking for compliance, which would you prefer?

sseppi commented 1 year ago

@henri-egger @gappc In my point of view we should go for the first way you proposed: "prevent the user from committing and give him installation instructions". Otherwise I fear that all developers who doesn't have reuse installed will ignore the reuse compliance.

I would then apply the same logic to all our repositories.

@dulvui @clezag @RudiThoeni What the Software Architects think?

clezag commented 1 year ago

@sseppi developers cannot really ignore it, because the CI/CD fails if not reuse compliant (and tells you so)

We talked about this, and think the best solution would be:

gappc commented 1 year ago

@sseppi @clezag @dulvui @RudiThoeni some questions:

Rationale: we can not prevent people from committing code without reuse compliance, there are ways to circumvent commit hooks and I have seen people take that road because it was more convenient than having the hooks run. If people are allowed to commit / push directly to development, then we will have commits and files that are not compliant - that's 100% for sure (I suppose direct commits / pushes to main are already prevented).

We can prevent this only if code is developed in feature branches and then merged to the development branch in GitHub - no more direct commits / pushes are then allowed to development. Having feature branches and the merge process, the CI can test if the code is compliant before the merge is done.

Now, if ANY commit MUST be reuse compliant, the merge MUST be done in squash manner, that means that all commits in a feature branch are taken together into one single commit. Only that way we can ensure that every (squashed) commit that gets merged to the development branch is reuse compliant.

That's the GitHub part.

Next, as supportive step, we should provide pre-commit hooks, such that developers get aware of reuse compliance problems on commit time.

For this I strongly opt to find ways that work like the lint integration that is part of the project (see our current pre-commit hook). The whole linting installation and pre-commit stuff works out of the box with npm install, no need for a developer to install some software manually and that's very important (in my opinion), because if you tell people that they need to install other software, then you have to support that requirement for the next years to come on Windows / MacOS / Linux / ... (good luck with that) and the requirement to install additional software will turn people away from the project because - let's be honest - who want's to install software just because it is required in some project. For me that would be a tipping point where I would think twice if I want to contribute.

@sseppi a bit offtopic: should we move the whole discussion into its own issue / EPIC? There seem to be some bigger steps / changes necessary.

gappc commented 1 year ago

@sseppi @clezag @dulvui @RudiThoeni I just saw the installation instructions of reuse: https://github.com/fsfe/reuse-tool#install. If I'm right, to use the reuse-tool you either need one of the following:

That's a major obstacle for a lot of developers (ask Win users). Do you know of any alternative tool to achieve reuse compliance?

If there are no other options, we need to clearly document the possibilities and give hint's on how the reuse-tool can be integrated with the pre-commit hook if a developer chooses to use Python / Docker. See also PR #387

gappc commented 1 year ago

@sseppi @clezag @dulvui @RudiThoeni I tried to run reuse via the Docker image locally. If executed as shown in the reuse docs it outputs that the project is not compliant: "Unfortunately, your project is not compliant with version 3.0 of the REUSE Specification :-("

Steps to reproduce:

# Clone Data Browser repo
git clone https://github.com/noi-techpark/it.bz.opendatahub.databrowser

# Change into the databrowser directory
cd it.bz.opendatahub.databrowser/databrowser

# Install dependencies
npm install

# Run reuse via Docker. IMPORTANT: reuse must be run from project root to find the GIT repo!!!! Therefor we mount $(pwd)/.. in Docker, given that we are currently in the databrowser directory of the project root
docker run --rm --volume $(pwd)/..:/data fsfe/reuse lint

The problem seems to be related to this vulnerability: https://github.blog/2022-04-12-git-security-vulnerability-announced/. To prevent the problem from happening, one must define the GIT project directory as safe on a global level, issuing git config --global --add safe.directory /MY_PROJECT_DIR in the Docker container before running the reuse lint task. Note that you MUST mount the project root in order for anything to work:

# Start the reuse Docker in interactive mode using sh as entry point. IMPORTANT: be sure to mount the project root, otherwise it will not work!!!!!!
# The command below expects to be run from the project root (not the databrowser directory)
docker run -it --rm --volume $(pwd):/data --entrypoint /bin/sh fsfe/reuse

# Mark the project as safe inside the docker container before linting
git config --global --add safe.directory /data

# Then lint
reuse lint

This is the single-line version of the procedure:

docker run --rm --volume $(pwd):/data --entrypoint /bin/sh fsfe/reuse -c "git config --global --add safe.directory /data && reuse lint"

This needs some good documentation and further problems will arise for sure (e.g. $(pwd) won't work on Windows, so we need to document that as well).

My personal opinion: I'm not very happy about reuse being a project requirement, it introduces a lot of headaches and problems right now (let alone in the future) and the tools have problems on their own. I recognize that licensing stuff is hard, but I thought that having a LICENSE file is enough for a project because it applies to all files. What do you guys say? What is your experience with reuse so far (e.g. in other projects)?

sseppi commented 1 year ago

Hi all,

I think that, since we are FSFE (promoter and developer of the Reuse Project) member and PAtrick is in the board, for @ohnewein is really importat that all Repositories of our team are Reuse Compliant. I add him to the discussion, so he can provide us more feedback.

clezag commented 1 year ago

In an effort to not over-complicate things:

I do agree that the pre-commit part is useful and necessary for developers, and a bit of a hassle to set up.
We've looked into it a bit, and for security reasons git does not allow repo-wide commit hooks that are automatically enabled on clone. So there won't be a fully automatic one-size-fits-all silver bullet solution I'm afraid. We haven't settled on what we will use to handle the hooks yet, for npm projects husky that you are already using might fit, but I think it will be something like the pre-commit python package for other projects. Considering that the reuse tool is python as well this limits the software a contributor has to install on his machine, and it's platform agnostic.

Personally I see it this way:

If you are a frequent contributor doing lots of changes, you will get annoyed of the CI rejecting your commits and will install the client side hook eventually. Our job will be making this as easy and simple as possible.
Once you've installed REUSE and some helper like pre-commit once, it will just be a matter of issuing a single command to set up other repos after the clone.

On the other hand if you are just someone changing a few lines of code, or adding a single file, you don't have to do anything (because you just changed some lines) or simply copy the header from another file and you're good to go

Again, you don't strictly HAVE to install REUSE or any hooks, you can also bash your head against the CI until compliance is achieved if you prefer to do so.

ohnewein commented 1 year ago

Hi all,

thank you @clezag for the very good summary!

Correct and machine-readable licensing is growing in importance. That's why companies like Siemens and FOSS communities like KDE are supporting and adopting REUSE.

The complexity to have correct licensing is low. For contributors to our repositories the effort is:

In this and all other cases you will get support by our Software Architects about how to solve corner cases, which are not handled by default.

Considering the low effort to have clear and machine-readable licensing, which is very important for many license related activities (think about SBOM generation) we decided to use RESUE for all our repositories.

Have a nice weekend, Patrick

gappc commented 1 year ago

@ohnewein I understand that having machine-readable licenses is of growing importance, but I don't understand the reason to have the header in every file, if it should suffice (at least in my opinion) to have one file that says: .md -> license X, .js -> license Y; think of dep5 (where it seems to be ok) but for other project files.

I'm confident that those discussions were already made in the reuse team, but given that you are more involved in the reuse space, please take these comments as a feedback - maybe there is room for improvement ;) (by the way, if in the reuse group comes up a discussion, it would be nice to have native reuse-tool support for node/npm, java, ..., that would already simplify things)

@clezag its good to not over-complicate things, but one should also strive to think stuff through in a sense that it is clear what the requirements are and how to handle them.

In that spirit, there is at least one question that needs an answer because it affects the outcome (and maybe the reuse compliance): must every commit be reuse compatible, or is it enough if the PR / push HEAD is reuse compatible (a PR / push may consist of several commits)? In the former case I outlined an idea on how that could be handled (merge only + squash), in the latter case it's CI only which is already in place

One last note having the developer glasses on: please @noi provide great tooling support and documentation, because every obstacle might turn developers and contributors away.

gappc commented 1 year ago

@all: should we move this discussion into its own issue / discussion thread? I get the feeling that it could get lost as part of this PR and in my opinion it is interesting enough to have it somewhere more prominent

clezag commented 1 year ago

must every commit be reuse compatible, or is it enough if the PR / push HEAD is reuse compatible (a PR / push may consist of several commits)? In the former case I outlined an idea on how that could be handled (merge only + squash), in the latter case it's CI only which is already in place

It's the latter.

Consider it just another test / lint, that you should always strive to get you green check mark on. The CD being dependent on the lint passing forces you to fix it sooner or later, which is good enough for us.