Open scriptsrc opened 3 years ago
Hi there @scriptsrc ,
So a couple of months ago I was with the same problem and I tried contacting Microsoft SARIF team's provided email with no luck. I managed to work it out like this https://github.com/smartbugs/smartbugs/blob/master/src/output_parser/SarifHolder.py under the SarifHolder.serializeSarif()
function. It is certainly not the most direct method but it does the job.
Hope it helps you, feel free to contact me with any doubts you may have.
Really, any documentation at all for usage would be nice.
Dropping a +1 here. If we want SARIF to have more stickiness, then we definitely have to have better tutorials and instructions on how to use it. I maintain a few open-source security projects of my own, and my company has more, even if I started a project to convert to SARIF output options tomorrow it would be very difficult.
@dindonero All you really need to serialize is the following:
import sarif_om
from jschema_to_python.to_json import to_json
sarif_log = sarif_om.SarifLog(
schema_uri='https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json',
...
)
to_json(sarif_log)
jschema_to_python.to_json
will automatically use the schema_property_name
metadata of the fields to map to the schema names.
Thanks to @melsabagh-kw for the serialization example, is there also some simple de-serialization example showing how to read sarif files and create the related OM?
@melsabagh thanks for the input. I am trying to generate JSON from and getting following error. Would you know what might be the issue?
Traceback (most recent call last):
File "/Users/user1/github/my-project/./my-project.py", line 159, in <module>
main()
File "/Users/user1/github/my-project/./my-project.py", line 151, in main
generate_report(data=data)
File "/Users/user1/github/my-project/./my-project.py", line 44, in generate_report
to_json(report)
File "/Users/user1/.local/share/virtualenvs/my-project-2_gkca5i/lib/python3.11/site-packages/jschema_to_python/to_json.py", line 20, in to_json
return json.dumps(obj, indent=2, default=_generated_class_serializer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user1/.pyenv/versions/3.11.4/lib/python3.11/json/__init__.py", line 238, in dumps
**kw).encode(obj)
^^^^^^^^^^^
File "/Users/user1/.pyenv/versions/3.11.4/lib/python3.11/json/encoder.py", line 202, in encode
chunks = list(chunks)
^^^^^^^^^^^^
File "/Users/user1/.pyenv/versions/3.11.4/lib/python3.11/json/encoder.py", line 439, in _iterencode
o = _default(o)
^^^^^^^^^^^
File "/Users/user1/.local/share/virtualenvs/my-project-2_gkca5i/lib/python3.11/site-packages/jschema_to_python/to_json.py", line 26, in _generated_class_serializer
dict = copy.deepcopy(dict)
^^^^^^^^^^^^^^^^^^^
File "/Users/user1/.pyenv/versions/3.11.4/lib/python3.11/copy.py", line 161, in deepcopy
rv = reductor(4)
^^^^^^^^^^^
TypeError: cannot pickle 'mappingproxy' object
Thanks to @melsabagh-kw for the serialization example, is there also some simple de-serialization example showing how to read sarif files and create the related OM?
@ths11 materializing a SarifLog
from JSON is a bit more complicated as you would need to map back from the SARIF field names to the Python classes and fields used by sarif-om
. I came up with the following a while ago which worked OK for me:
import json
import attrs
import sarif_om
def get_sarif_class(schema_ref):
class_name = schema_ref.split('/')[-1]
class_name = class_name[0].capitalize() + class_name[1:]
return getattr(sarif_om, class_name)
def get_field_name(schema_property_name, cls):
for field in attrs.fields(cls):
if field.metadata.get('schema_property_name') == schema_property_name:
return field.name
return schema_property_name
def get_schema_properties(schema, schema_ref):
cursor = schema
for part in schema_ref.split('/'):
if part == '#':
cursor = schema
else:
cursor = cursor[part]
return cursor['properties']
def materialize(data, cls, schema, schema_ref):
fields = {}
extras = {}
props = get_schema_properties(schema, schema_ref)
for key, value in data.items():
field_name = get_field_name(key, cls)
if key not in props:
extras[field_name] = value
continue
if '$ref' in props[key]:
schema_ref = props[key]['$ref']
field_cls = get_sarif_class(schema_ref)
fields[field_name] = materialize(value, field_cls, schema, schema_ref)
elif 'items' in props[key]:
schema_ref = props[key]['items'].get('$ref')
if schema_ref:
field_cls = get_sarif_class(schema_ref)
fields[field_name] = [materialize(v, field_cls, schema, schema_ref) for v in value]
else:
fields[field_name] = value
else:
fields[field_name] = value
obj = cls(**fields)
obj.__dict__.update(extras)
return obj
with open('test.sarif', 'r') as file:
data = json.load(file)
with open('sarif-schema-2.1.0.json', 'r') as file:
schema = json.load(file)
sarif_log = materialize(data, sarif_om.SarifLog, schema, '#')
...
@maratsal my guess is you have a custom property in your SARIF that uses a dataclass.field
with a dict
factory.
I think https://github.com/microsoft/sarif-python-om/pull/6 will really ease the use of this library since it provides much better type hint and a better way to serialize/deserialize SARIF report.
Hope that it can be merged soon!
Hey y'all.
The readme only has this for usage:
I'm reading through the microsoft sarif tutorials here: https://github.com/microsoft/sarif-tutorials/blob/main/docs/3-Beyond-basics.md
And trying to figure out how I could use this class to convert the issues found in bad-eval-with-code-flow.py and output the file bad-eval-with-code-flow.sarif
It would be much easier to jump into this project if there was at least one example.
Anyways, thanks for supporting python!