Changes to Capirca to allow for additional policy parsers.

ankenyr commented 8 years ago

Capirca has historically only ever had the ply parsers but there is active work on a yaml parser. I would like to take some time to solicit opinions on how we should approach the inclusion of other parsers, their interaction with policy.py, and the rest of Capirca. The following are a couple of ideas suggested by people but I invite others to propose their own and discuss their opinions on each.

1) Policy.py is passed data and in turn sends these to the correct parser. The parser is required to send back some data structure like a dict. Policy.py will use this to then use this structure to build the policy object.

2) Have additional parsers convert to a pol format string which is then fed into policy.py. This could be done with a templating system or something else.

My thoughts are that option 2 is the best choice. It has been pointed out that by having multiple parsers there are chances for equivalent policies to be constructed differently resulting is disparate syntax files. It would also require less manipulation of policy.py.

The only problem with this choice is it ends up further ingraining the lexx/yacc into things. If I was to rewrite Capirca today I would most likely not choose lexx/yacc but opt for some simpler format. I don't know if others have ever tried adding new keywords to the parser but I find it to be a hassle. This may be a non issue though as I have not seen any demand for new additions to terms in a while.

Anyway, please feel free to comment on your opinions if you feel strongly one way or the other.

usernameisnottaken commented 8 years ago

I originally proposed the templating mechanism (aka yaml2ply) - and after talking locally I think that another option would be to dump ply and instead refocus efforts on something like yaml/json. The problem with this is that an intermediate ply2yaml/ply2json tool would need to be built, which would have to exist for some sensible migration period before ply/lexx are completely cleaned up.

As a consideration, the custom grammar is reasonably straight-forward for non-engineering types, whereas yaml/json are much better for machine generation/consumption but harder for non-engineering types to edit without more tooling/training.

So I think in the end while things could have been done differently from the get-go, the ply/grammar format works and making changes to it probably causes more impact than benefits it offers. So in the end I think option #2 and building a tool to do yaml=>ply makes way more sense.

finfinack commented 8 years ago

Personally, I think it would make sense to define some "standard" data structure which holds the policy. Whatever parser is used would have to produce something in that format so the framework can validate it against that. Imo, something that is directly exportable again would make sense (Option 2 being one flavor of that I think).

Out of curiosity: Have you considered using Protocol Buffers (https://developers.google.com/protocol-buffers/)? From the top of my head:

if Capirca or parts of it will ever become a "ACL as a service", passing around data as serialized proto is what protos are built for
protos are defined in a meta file and can be verified against it - libraries readily available
for efficiency, they can be stored or passed around in serialized format but they also allow to be stored in a plain text format, again, library exists already to do that (https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.text_format-module)

Thoughts? Happy to help defining the proto messages if you decide to go that route.

usernameisnottaken commented 8 years ago

Going a protobuf (or similarly thrift) route is effectively option 1 where the intermediate format is constrained to those systems.

Option 2 is basically saying that the lexx/yacc format is the only ingestable format by aclgen, but we can write a tool that takes yaml/whatever and has the string-templating to output a lexx/yacc file. So with option 2 if you want to use yaml, you need to run "yaml2ply.py example.pol.yaml > example.pol && aclgen -p example.pol" instead of just "aclgen -p example.pol.yaml"

I like option 1 quite a lot but I think in practice it will create more complexity and consume more time than option 2.

finfinack commented 8 years ago

Sure, at the moment, .pol is the format that is ingestable by aclgen and it's probably too much work to change that but not impossible. I guess it depends on how important having a clean way to support multiple formats is. No strong preference either way here.

sneakywombat commented 8 years ago

I also am strongly in favor of option 2.

ankenyr commented 8 years ago

Worked over the weekend on something really simple. The template uses jinja templates and when fed a dict like below it will spit out a properly formatted pol file.

Next steps are making an example yamltodict.py to use it.

test_dict = [{ 'header': { 'comment': ['this is a sample edge input filter that generates', 'multiple output formats' ], 'targets': [{ 'platform': 'juniper', 'options': 'edge-inbound inet' }, { 'platform': 'cisco', 'options': 'edge-inbound mixed' }] }, 'terms': [{'include': 'policies/includes/include.inc'}, {'comment': 'Foobar.', 'source_address': 'DNS_SERVERS', 'source_exclude': 'NOT_DNS_SERVERS', 'source_port': 'DNS_PORTS', 'source_interface': 'configured-neighbors-only', 'source_prefix': 'configured-neighbors-only', 'destination_address': 'MAIL_SERVERS', 'destination_exclude': 'NOT_MAIL_SERVERS', 'destination_port': 'DNS_PORTS', 'destination_interface': 'configured-neighbors-only', 'address': 'OTHER_SERVERS', 'address-exclude': 'SOME_OTHER_SERVER', 'port': 'SOME_PORT', 'protocol': 'tcp', 'protocol_except': 'udp', 'logging': 'true', 'counter': 'foobar', 'next-ip': 'NEXT-IP', 'action': 'accept', 'ether-type': 'arp', 'forwarding-class': 'fwd-cls', 'frament-offset': '1-6', 'hop-limit': '20', 'icmp-type': 'echo-reply', 'loss_priority': 'low', 'option': 'tcp-established', 'packet_length': '1-256', 'platform': 'juniper', 'platform_exclude': 'cisco', 'policer': 'rate-limit', 'precedence': 'first', 'principals': 'some_principal', 'qos': 'high', 'routing-instance': 'foobar-router', 'timeout': '44', 'traffic-type': 'some-unicast', 'vpn': 'VPN', 'verbatim': 'some verbatim stuff', 'owner': 'john@foobar.com', 'expiration': '2016-06-19' }] }, { 'header': { 'comment': ['this is a sample output filter'], 'targets': [{ 'platform': 'juniper', 'options': 'edge-outbound inet' }, { 'platform': 'cisco', 'options': 'edge-outbound mixed' }] }, 'terms': [{'include': 'policies/include/another_include.inc'}, {'destination_address': 'CORP_NETWORK', 'destination_port': 'HIGH_PORTS', 'action': 'accept'}] }]

jzohrab commented 8 years ago

Late arrival, just saw this discussion.

I worked on the YAML parser, the implementation is in https://github.com/google/capirca/pull/73, in lib/yamlpolicyparser.py. The full parser was only about ~300 LOC, but it lacks some data checks at the moment.

For me, the main benefit of this work was not the parser itself, but rather the cleaner separation between the data model, and the parsing that loads the model. I split parsing from the model in https://github.com/google/capirca/pull/72. I'm a fan of domain-driven design, and so wanted to see a stronger Policy object, with its own field-level validation (which the yaml parser would rely on), with clearly separated classes for loading and generating.

Re @finfinack 's comment, "if Capirca or parts of it will ever become a "ACL as a service"" - that was our intention for a small internal project we're building. It would potentially be nice to store ACL data in some other format, or even have users gen it with a GUI somehow.

I don't think it matters how the ACL is generated, as long as there is a single consistent entry point. I prefer the main logic/integrity checks to be at the model level, and separating the data structure from parsing and loading, as I feel that would add clarity to the project; however, generating pol strings would also work, and it's a question of what the project owners/maintainers feel most comfortable with.

XioNoX commented 3 years ago

I'm wondering if anyone made progress on this front since 2016? Being able to define ACLs as YAML instead of the current format would be a big win for us.

jzohrab commented 3 years ago

Hi @XioNoX , I did this in PRs 72 and 73 but the work wasn’t accepted ... perhaps some of that work can be scavenged for your use, or you can write something that transforms yaml to the capirca format.

google / capirca

Changes to Capirca to allow for additional policy parsers. #78