Open tarasmadan opened 2 months ago
We have a tools/syz-reprolist created for this purpose. It currently uses the dashAPI and requires client names + access keys.
The proposal is to:
@dvyukov @a-nogikh wdyt?
Do you consider doing just raw export, or something that does regression testing out-of-the-box? I would assume that raw export won't be too useful for most users. They won't be able to use them, or will use incorrectly. End-to-end solution that distros can use for testing should also include build/run wrappers that will check kernel config, run tests in parallel with timeouts, monitor dmesg output for bugs + docs on how to use this.
If we export them (which is required for export form non-public namespaces), then the current auth can work as well. "gcloud auth login" is a bit handier, but not a game changer. What would be a game changer is fully automated periodic export.
+there is an unresolved problem with missing C repros in lots of cases. syz-reprolist is slow and unreliable (may be broken already). I think we should keep C repros in datastore rather than re-create.
For filtering purposes we could also annotate exported reproducers with some metadata (subsystem, expected running time, bug type, etc). There will be lots of reproducers (tens of thousands), so users may want to invoke some subsets of tests (faster ones, or for more critical bug types only). Runner program could accept these filter and run corresponding subsets.
Do you consider doing just raw export, or something that does regression testing out-of-the-box?
I want the user to get a C reproducers collection like https://github.com/dvyukov/syzkaller-repros.
What would be a game changer is fully automated periodic export.
What do you mean? I want every syz-reprolist call to create the latest snapshot.
What would be a game changer is fully automated periodic export.
+1. Maybe even to some git repository exactly like it was done manually before.
I think we should keep C repros in datastore rather than re-create.
But for older ones we'd still have to invoke older syz-prog2c
versions, right? Or, probably, just ignore the syz repro bugs in this export? There are not too many of them.
to some git repository exactly like it was done manually before
Pro:
Contra:
What do you mean? I want every syz-reprolist call to create the latest snapshot.
Is it OK to export tens of thousands of reproducers each time? I was thinking of checking them into a git repo.
But for older ones we'd still have to invoke older syz-prog2c versions, right? Or, probably, just ignore the syz repro bugs in this export? There are not too many of them.
Yes, either ignore, or upload once what we can easily recover. syz-reprolist may run for days, but it's fine if done once.
What about private namespaces? More git repos?
I would export into a single repo all reproducers that were obtained on kernels with public source code.
How to track usage?
Don't track. I not sure raw number of API invocations is very important. Users may still cache result on their side, then the number will be low. Or they can pull it every minute, but what's the impact of that.
The filter based selection looks more complex.
I would concentrate on end user use cases. This looks like a minor impl detail. Not writing several dozens lines of code to sacrifice user experience and adoption does not looks like a good tradeoff.
What do you mean? I want every syz-reprolist call to create the latest snapshot.
Is it OK to export tens of thousands of reproducers each time? I was thinking of checking them into a git repo. Tens of thousands is doable if we have good benefits.
6k_repros.tar.gz from https://github.com/dvyukov/syzkaller-repros is 28 megabytes. But it is a 2 years old repo. We added the filesystems... and want to scale fuzzing. It can take hundreds of megabytes in a few years. Agree, git looks better from this perspective. Combined with repro annotations it covers any scenario I can think about.
@gkennedy12 also periodically asks for updates (which unfortunately slept through the cracks).
Thanks for the inputs. Let's try once more! For every public namespace we want to mirror ReproC files from the datastore to some public git repository.
Something like this:
What's the use case for separating them by namespace? We can also export from non-public but open-source kernels (that's that I used to do).
and monitor these reproducers
What is it about?
and monitor these reproducers
What is it about?
Detect that they triggered a bug. Lots of kernel test suites just run tests and then ignore actual bugs they provoked in the kernel, so tests look like passing.
https://github.com/syzbot-noreply is now registered to perform the bot operations.
https://github.com/syzbot-noreply is now registered to perform the bot operations.
It was me.
Detect that they triggered a bug. Lots of kernel test suites just run tests and then ignore actual bugs they provoked in the kernel, so tests look like passing.
We have lots of the required logic in syzkaller already. It could be a new syz-manager/execprog mode. But on the other hand, it may complicate things for users. Not sure what's the right balance.
Motivation: