oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.53k stars 299 forks source link

Allow "non-repository" input to the analyzer #8803

Open sschuberth opened 1 week ago

sschuberth commented 1 week ago

As ORT takes transparency and reproducibility of results serious, currently only local directories that are under version control can serve as the input to the analyzer. This is because for such "working trees" it is machine-readable where they originate from, and also the state of the source code is encoded into the VCS revision. If a remote VCS repository is to be analyzed, it either needs to be manually cloned with the respective VCS tool or the ORT Downloader first.

However, there are use-cases where the source code to analyze never was checked into version control, and committing it to a "fake" repository just to be able to analyze it defeats the purpose of capturing genuine provenance information.

Thus the proposal is to relax / extend the valid input types for the analyzer to the following:

Open questions:

Related issues / PRs:

fviernau commented 3 days ago

What we have missed to discuss so far, but I believe is important to clarify is: Project.vcs / Project.vcsProcessed. Previously, one could always set it. The feature implemented here should probably ensure we can still always set that property. So, its datatype would need to change?!

sschuberth commented 3 days ago

Good point about a Project's provenance info. If the analysis is not performed on a repository, it also does not make sense to store VCS-related information for the Projects, and these should probably be migrated to KnownProvenance as well.

This relates a bit to @mnonnenmacher's questions about how follow-up ORT tools, that might run on a different machine / node, should get access to the provenance's / project's source code.

pepper-jk commented 2 days ago

Would you like me to incorporate those changes to Project into #8764 as well before we proceed?

sschuberth commented 2 days ago

Would you like me to incorporate those changes to Project into #8764 as well before we proceed?

I'm not sure. Your PR is already quite involving, so maybe we should not make it yet more complex, but roll out the changes in stages, if possible.

pepper-jk commented 2 days ago

Yes, I see the problem: either we have breaking changes after every small pull request, or we have a large pull request, which is impossible to review.

Maybe we could create a feature branch for this topic and merge any PRs onto there first and only once it is finished you merge it into main? This way there would only be breaking changes in one release and you also had small PRs to review.

FYI: I have taken a look at the changes required for Project and they appear to be very similar to what I already did in Repository, so I believe adding those on top would be straight forward. It would just takes a bit more work.