microsoft / sarif-python-om

Python classes for the SARIF object model
MIT License
38 stars 18 forks source link

Request: Slightly better usage example #5

Open scriptsrc opened 3 years ago

scriptsrc commented 3 years ago

Hey y'all.

The readme only has this for usage:

pip install sarif-om
import sarif_om

I'm reading through the microsoft sarif tutorials here: https://github.com/microsoft/sarif-tutorials/blob/main/docs/3-Beyond-basics.md

And trying to figure out how I could use this class to convert the issues found in bad-eval-with-code-flow.py and output the file bad-eval-with-code-flow.sarif

It would be much easier to jump into this project if there was at least one example.

Anyways, thanks for supporting python!

dindonero commented 3 years ago

Hi there @scriptsrc ,

So a couple of months ago I was with the same problem and I tried contacting Microsoft SARIF team's provided email with no luck. I managed to work it out like this https://github.com/smartbugs/smartbugs/blob/master/src/output_parser/SarifHolder.py under the SarifHolder.serializeSarif() function. It is certainly not the most direct method but it does the job.

Hope it helps you, feel free to contact me with any doubts you may have.

kjcolley7 commented 2 years ago

Really, any documentation at all for usage would be nice.

jonrau-lightspin commented 2 years ago

Dropping a +1 here. If we want SARIF to have more stickiness, then we definitely have to have better tutorials and instructions on how to use it. I maintain a few open-source security projects of my own, and my company has more, even if I started a project to convert to SARIF output options tomorrow it would be very difficult.

melsabagh commented 2 years ago

@dindonero All you really need to serialize is the following:

import sarif_om
from jschema_to_python.to_json import to_json

sarif_log = sarif_om.SarifLog(
    schema_uri='https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json', 
    ...
)

to_json(sarif_log)

jschema_to_python.to_json will automatically use the schema_property_name metadata of the fields to map to the schema names.

ths11 commented 1 year ago

Thanks to @melsabagh-kw for the serialization example, is there also some simple de-serialization example showing how to read sarif files and create the related OM?

maratsal commented 1 year ago

@melsabagh thanks for the input. I am trying to generate JSON from and getting following error. Would you know what might be the issue?

Traceback (most recent call last):
  File "/Users/user1/github/my-project/./my-project.py", line 159, in <module>
    main()
  File "/Users/user1/github/my-project/./my-project.py", line 151, in main
    generate_report(data=data)
  File "/Users/user1/github/my-project/./my-project.py", line 44, in generate_report
    to_json(report)
  File "/Users/user1/.local/share/virtualenvs/my-project-2_gkca5i/lib/python3.11/site-packages/jschema_to_python/to_json.py", line 20, in to_json
    return json.dumps(obj, indent=2, default=_generated_class_serializer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user1/.pyenv/versions/3.11.4/lib/python3.11/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "/Users/user1/.pyenv/versions/3.11.4/lib/python3.11/json/encoder.py", line 202, in encode
    chunks = list(chunks)
             ^^^^^^^^^^^^
  File "/Users/user1/.pyenv/versions/3.11.4/lib/python3.11/json/encoder.py", line 439, in _iterencode
    o = _default(o)
        ^^^^^^^^^^^
  File "/Users/user1/.local/share/virtualenvs/my-project-2_gkca5i/lib/python3.11/site-packages/jschema_to_python/to_json.py", line 26, in _generated_class_serializer
    dict = copy.deepcopy(dict)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/user1/.pyenv/versions/3.11.4/lib/python3.11/copy.py", line 161, in deepcopy
    rv = reductor(4)
         ^^^^^^^^^^^
TypeError: cannot pickle 'mappingproxy' object
melsabagh commented 1 year ago

Thanks to @melsabagh-kw for the serialization example, is there also some simple de-serialization example showing how to read sarif files and create the related OM?

@ths11 materializing a SarifLog from JSON is a bit more complicated as you would need to map back from the SARIF field names to the Python classes and fields used by sarif-om. I came up with the following a while ago which worked OK for me:

    import json

    import attrs
    import sarif_om

    def get_sarif_class(schema_ref):
        class_name = schema_ref.split('/')[-1]
        class_name = class_name[0].capitalize() + class_name[1:]
        return getattr(sarif_om, class_name)

    def get_field_name(schema_property_name, cls):
        for field in attrs.fields(cls):
            if field.metadata.get('schema_property_name') == schema_property_name:
                return field.name
        return schema_property_name

    def get_schema_properties(schema, schema_ref):
        cursor = schema
        for part in schema_ref.split('/'):
            if part == '#':
                cursor = schema
            else:
                cursor = cursor[part]
        return cursor['properties']

    def materialize(data, cls, schema, schema_ref):
        fields = {}
        extras = {}
        props = get_schema_properties(schema, schema_ref)

        for key, value in data.items():
            field_name = get_field_name(key, cls)

            if key not in props:
                extras[field_name] = value
                continue

            if '$ref' in props[key]:
                schema_ref = props[key]['$ref']
                field_cls = get_sarif_class(schema_ref)
                fields[field_name] = materialize(value, field_cls, schema, schema_ref)

            elif 'items' in props[key]:
                schema_ref = props[key]['items'].get('$ref')
                if schema_ref:
                    field_cls = get_sarif_class(schema_ref)
                    fields[field_name] = [materialize(v, field_cls, schema, schema_ref) for v in value]
                else:
                    fields[field_name] = value
            else:
                fields[field_name] = value

        obj = cls(**fields)
        obj.__dict__.update(extras)
        return obj

    with open('test.sarif', 'r') as file:
        data = json.load(file)

    with open('sarif-schema-2.1.0.json', 'r') as file:
        schema = json.load(file)

    sarif_log = materialize(data, sarif_om.SarifLog, schema, '#')
    ...
melsabagh commented 1 year ago

@maratsal my guess is you have a custom property in your SARIF that uses a dataclass.field with a dict factory.

FunJim commented 6 months ago

I think https://github.com/microsoft/sarif-python-om/pull/6 will really ease the use of this library since it provides much better type hint and a better way to serialize/deserialize SARIF report.

Hope that it can be merged soon!