Closed scalvert closed 3 years ago
Have we tried moving the call to ajv.compile
"higher up"? Also would leveraging ajv.getSchema('#/definitions/___')
simplify things?
I'm a bit hesitant about the API "spam". Some of these definitions (and thus APIs) are getting into esoteric parts of the spec.
Have we tried moving the call to
ajv.compile
"higher up"? Also would leveragingajv.getSchema('#/definitions/___')
simplify things?
Could you comment on the specific file you're referring to? Also, can you elaborate on why you think this would be beneficial (potential cacheability of the compile operation, for example). Thanks!
Also would leveraging
ajv.getSchema('#/definitions/___')
simplify things?
Could you explain why you think it'd simplify things? Most of that API surface area is private, so if there's a strong reason to make the private API calls simpler, I'm all for it, but for this specific callout it seems arbitrary (I may be missing something!).
I'm a bit hesitant about the API "spam". Some of these definitions (and thus APIs) are getting into esoteric parts of the spec.
I understand the sentiment, and thought about this myself. Some of the reasons I went this way are
Happy to discuss further!
Currently it looks to be compiling per toMatchSarif...()
call. Between moving ajv.compile
and using ajv.getSchema
, I think you may be able to reduce that number. At the extreme, I think just once for the module. Also, ajv.getSchema
may be able to remove the need for getFragmentSchema
. Some of this may require shared/global ajv
instance.
It does, because each compilation is per unique schema. I'm sure we could reduce the number, but it's also a one-time cost as the index is imported/required. Additionally, this is test infrastructure, so it may end up with us over optimizing code targeted for test infra that, while important to ensure the perf of, may not result in any meaningful gain.
I'd like to try to balance the cleanliness and readability of the code with the need to optimize. Does that make sense?
Here's an approximation/sketch. We can skip getSchema
and getValidatorAndSchema
and leverage Ajv
for the caching and schema fragments. Also, the current version seems to be unintentionally locked on a SARIF version when using definitionName
so treating that as a separate issue.
import Ajv from 'ajv';
import fetch from 'sync-fetch';
const sarifV2SchemaUrl = 'https://raw.githubusercontent.com/schemastore/schemastore/master/src/schemas/json/sarif-2.1.0-rtm.5.json';
const ajv = new Ajv({
schemaId: 'auto',
validateSchema: false,
});
ajv.addSchema(fetch(sarifV2SchemaUrl).json());
function buildSarifMatcher(definitionName?: string) {
const keyRef = definitionName
? `${sarifV2SchemaUrl}#/definitions/${definitionName}`
: sarifV2SchemaUrl;
const validate = ajv.getSchema(keyRef);
return received => {
// Transform errors to messages here.
return {
actual: received,
message: () => 'foobar',
name: 'toMatchSarif___',
pass: validate(received),
};
};
}
This seems fine to move to creating a single instance in module scope, so long as that doesn't cause unforeseen issues WRT ajv
's state. Specifically, if there's a conflict between schema validations as the result of the instance's internal state, we may need to keep it as is.
Also, the current version seems to be unintentionally locked on a SARIF version when using definitionName so treating that as a separate issue.
I'm not sure what you mean here - could you elaborate?
@jeffersonking and I synced up offline. It turns out that referencing specific definitions within the parent schema is natively supported in ajv
, so we're moving to that vs creating a specific sub-schema and loading that.
We're also exploring the pros/cons of using autogenerated APIs for all definitions vs. a single API that can be passed a definition string to reference the schema's definition itself.
Example:
Auto-generated API matcher:
expect(result).toBeValidSarifResult();
vs.
Generic definition matcher:
expect(result).toBeValidSarifFor('result');
The former allows the consumer to pick from an available list of definition matchers vs. having to know the string that corresponds to the definition. It does add a larger API surface area, which potentially can seem overwhelming to consumers.
That said, if the goal is to make this very easy for consumers to onboard to using/testing parts of the SARIF spec, using the former won't require them to look up the definitions they're needing to test. The counter argument could be that if consumers are constructing SARIF, and specific parts of the schema based on a definition, they'd likely know that definition name already, and thus could simply pass it in as a param.
I'm open to other's opinions on this, as I'm not tied to any one particular direction.
cc/ @jeffersonking @rwjblue
Great feedback! I was just chatting with @scalvert about this. I like both of these APIs (I think they would both satisfy our requirements for testability). If folks don't have a strong preference one way or the other, lets go with the latter suggestion .toMatchSarifFor('result')
. This path still allows us to easily add the other matcher functions if we want, but limits the initial API surface.
Yep, I'm inclined to agree. The way the first one evolved was through my initial work in attempting to improve dynamic typing for the buildMatcher
API. It evolved to creating APIs for each definition, but @jeffersonking correctly pointed out that a smaller API surface area may be slightly better/simpler. Moving to using this API is likely an incremental improvement to exposing the buildMatcher
function directly, so I think we get the best of both worlds.
I'll make those changes now.
Glad we're having this discussion. I would not object either way. One slight tip towards the "generic" approach is that the docs (assuming they're auto-generated) may be easier to read or at least fit on one page.
OK, this PR is ready to go.
This PR pivots the intent of the library slightly from providing the
buildMatchers
function publicly and expecting consumers to generate their own definition matchers to using a single matcher:toBeValidSarifFor
. This accomplishes a few things:buildMatcher
functionThis PR adds:
npm run generate
command, which generates the followingDefinition
union type, which is the union of all available definitions in the SARIF schematoBeValidSarifFor(definition)