google / oss-fuzz

OSS-Fuzz - continuous fuzzing for open source software.
https://google.github.io/oss-fuzz
Apache License 2.0
10.55k stars 2.23k forks source link

[google/oss-fuzz/tree/master/projects] is there a way to split the too long list (truncated to 1000 entries)? #11751

Closed verdy-p closed 6 months ago

verdy-p commented 7 months ago

[google/oss-fuzz/tree/master/projects] is not usable online in GitHub, which truncates the list to the first 1000 entries at most, hiding already the 236 remaining ones (and new ones being added), i.e. all those with a name starting by [s-z].

Is there a way to reduce that list by splitting the folder into (relevant) subfolders ?

And then allow the building tools to be able to locate them in these subfolders, and then allow GitHub users to navigate in them online and focus more easily on domain-specific projects they want to work with?

image

jonathanmetzman commented 7 months ago

Would you want to maintain the integration (project.yaml, dockerfile and build.sh) in your own repo? I'm thinking of implementing a change to support this.

maflcko commented 7 months ago

Would you want to maintain the integration (project.yaml, dockerfile and build.sh) in your own repo? I'm thinking of implementing a change to support this.

Interesting. Though, I could imagine that infra changes are more difficult, if they require an atomic change of the dockerfile, or another file at the same time.

jonathanmetzman commented 7 months ago

Would you want to maintain the integration (project.yaml, dockerfile and build.sh) in your own repo? I'm thinking of implementing a change to support this.

Interesting. Though, I could imagine that infra changes are more difficult, if they require an atomic change of the dockerfile, or another file at the same time.

Yes actually after some discussion, we probably won't be doing this for that very reason.

evverx commented 7 months ago

I think there are a lot scripts (outside of the OSS-Fuzz repository) relying on this structure and this change certainly can break them all.

allow GitHub users to navigate in them online and focus more easily on domain-specific projects they want to work with?

Usually it's necessary to clone the OSS-Fuzz repository (and projects' repositories depending on how they are integrated) to build and test fuzz targets. While I agree that it isn't easy to browse the OSS-Fuzz repository onlline I'm not sure it's possible to do a lot of things using browsers.

DaveLak commented 7 months ago

FWIW, I've found the projects directory quite easy to browse & search online using the file browser UI GH launched a few months ago.

Here's a short screen recording example if it helps https://github.com/google/oss-fuzz/assets/17415134/bdbf3a74-51bd-4a0a-b3e5-6c5db4994986
verdy-p commented 7 months ago

@DaveLak Your MP4 does not include the problem reported above: do to the bottom of the list of projects, you'll see that the list is truncated and does not show more than 1000 projects, so projects with name currently starting by "s" (after "sciki-learn") are invisible (at the bottom of the list), and cannot be browsed online. And 235 projects are omitted.

That's why I created this bug: we need a simple way to organize/group projects into subfolders. Each project may eventually be able to setup a subfolder prefix in which they will be stored/categorized.

And we should not need to know how projects are possibly named to find them. For now they are simply invisible/not listed.

In fact, any source directory in any open source project should not contain so many entries. If needed, build scripts and bots should be able to recurse subdirectories (possibly by using some filter on them to indicate which one to consider, such as the presence of a specific metafile in it; here it could be simply an empty ".project" file, allowing projects to be organized very simply, by just renaming files into subfolders where the metafile has been added; a subdirectory without this metafile would be a project group, which could also have its distrinctive ".projectgroup" metafile, or possibly a "projectgroup.yml" file with specific build rules or other meta-information for maintainers, contacts, dependencies with other projects, evaluation/status/scope, target milestones, contacts, specific test suites, migration/conversion tools for other projects, etc.).

But navigability in all these files will be much simpler for everyone (both for online display or for maintainers). Metafiles for project groups could also be used to speedup the build process (e.g. by building subreports instead of rebuilding a single report for the whole project tree each time one of them is modified). Ideally any subdirectory should not contain more than about 200 files or subdirectories.

evverx commented 7 months ago

The screen recording shows that it's possible to jump directly to a directory and browse it online without having to list all the projects.

we need a simple way to organize/group projects into subfolders

I'd have to actually change a bunch of scripts responsible for actually building and fuzzing actual projects and relying on this structure so I'm curious what the rationale behind moving things around is.

jonathanmetzman commented 6 months ago

I think we won't be doing this. Sorry for the trouble it causes. I think the best bet (and it's not a good bet) is to ask github support to fix this.

pkillarjun commented 6 months ago

image

Can't read all the directories without downloading the repository.

Can I propose a simple change?

Shift all the Python and Java/JVM projects into a separate directory that starts with a space character or a number, and leave all the C and C++ projects in the main directory.

verdy-p commented 2 months ago

Simply creating an intermediate level of directories (not more than 200), with any short name you like would be enough, store all projects in one of them (not more than 200 in each group). This allows up to 40,000 projects, much more than needed for a long time. Any project could be changed from one directory to another, and these directory names may be organized by level of support (e.g. a "core" group for projects shared across multiple languages, php...; "exp" group from experiments; "legacy" groups for deprecated projects still supported but no longer recommended.)

The current problem is to discover groups.

For now when looking for projects online, you can just guess what could possibly be their given names (which may be more or less abbreviated or have synonyms, or could have been renamed over their history, by possibly merging several related projects or splitting them). There's not even a basic text file giving their list, one line per project with optional basic description on the same line (which could also describe some dependant projects, even if this is not processed, but helping human reviewers): that list should be sorted by name, matching the current directory name in which they were added.