Closed valinha closed 2 years ago
Thanks for the suggestion. This is non trivial but I think its doable.
Here's a first thought about how we might be able to specify that from a user perspective in the rule object without changing the schema too much.
If path is specified, then if the file can be parsed as structured data (XML, JSON, YML), find all the paths in the document that match the path, and run the pattern against their value.
Here's the rule you want above implemented in this schema:
{
"name": "My Feature",
"id": "MYID00001",
"description": "Detects if My Feature is enabled",
"applies_to": [
"json"
],
"tags": [
"Features.MyFeature"
],
"severity": "Moderate",
"patterns": [
{
"confidence": "High",
"pattern": "true",
"type": "string"
},
"path": "app.myfeature.enabled"
]
}
What do you think of this proposal?
Thanks for considering the functionality. I'm fine with it, if it's true that I would like it not to be dependent on the applies_to field, something like that would work as well?
{
"name": "My Feature",
"id": "MYID00001",
"description": "Detects if My Feature is enabled",
"applies_to_file_regex": [
"application-?.*.yml"
],
"tags": [
"Features.MyFeature"
],
"severity": "Moderate",
"patterns": [
{
"confidence": "High",
"pattern": "true",
"type": "string"
},
"path": "app.myfeature.enabled"
]
}
That’s fine it wouldnt change the apply to logic.
On Thu, Jan 20, 2022 at 10:27 AM, Alberto Valiña Lema @.***> wrote:
Thanks for considering the functionality. I'm fine with it, if it's true that I would like it not to be dependent on the applies_to field, something like that would work as well?
... "applies_to_file_regex": [ "application-?.*.yml". ], ...
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Consider to be able to check the value of a path, example:
{
"name": "My Feature",
"id": "MYID00001",
"description": "Detects if My Feature is enabled",
"applies_to_file_regex": [
"application-?.*.yml"
],
"tags": [
"Features.MyFeature"
],
"severity": "Moderate",
"patterns": [
{
"confidence": "High",
"pattern": "true",
"type": "string"
}
],
"paths": [
{
"path": "app.myfeature.enabled",
"value-pattern": "*true"
}
]
}
Where path expression depends of the file type and use json-path, yaml-path or xpath… for example and value-pattern could be optional for check the existence or the value of this path.
In addition, a list of paths could make sense as well as the list of patterns. If one is fulfilled it will be a match.
Can it make sense?
Consider to be able to check the value of a path, example:
{ "name": "My Feature", "id": "MYID00001", "description": "Detects if My Feature is enabled", "applies_to_file_regex": [ "application-?.*.yml" ], "tags": [ "Features.MyFeature" ], "severity": "Moderate", "patterns": [ { "confidence": "High", "pattern": "true", "type": "string" } ], "paths": [ { "path": "app.myfeature.enabled", "value-pattern": "* true" } ] }
Where path expression depends of the file type and use json-path, yaml-path or xpath…
I don't understand the distinction between them from a rule creation perspective. It seems to me they could all be represented in the same format app.myfeature.enabled
. Is there something I'm missing?
for example and value-pattern could be optional for check the existence or the value of this path.
value-pattern
you provided is but it doesn't look like valid regex..*
pattern wouldn't detect the existence of a path? I'm not sure we need a separate mechanism to check if something exists at all vs has a specific value.In addition, a list of paths could make sense as well as the list of patterns. If one is fulfilled it will be a match.
I think a list of paths would be fine. Matches will only be found if the patterns match the value of a path that is specified.
Updated proposal. I haven't decided which default value to use for allow-prefixes
yet.
{
"name": "My Feature",
"id": "MYID00001",
"description": "Detects if My Feature is enabled",
"applies_to": [
"json"
],
"tags": [
"Features.MyFeature"
],
"severity": "Moderate",
"patterns": [
{
"confidence": "High",
"pattern": "true",
"type": "string"
},
],
"paths": [
{
"pattern": "app.myfeature.enabled",
// Require the first specified path component to be at the root of the document. For example:
// app:
// myfeature:
// enabled:
"allow-prefixes": false
},
{
"pattern": "app.myfeature.enabled",
// Allows arbitrary prefixes before the first specified path component, for example:
// parent:
// app:
// myfeature:
// enabled:
"allow-prefixes": true
}
]
}
Sorry, I thought you were proposing something excluding the patterns part and that's why I didn't see how to set the value in the expression. I understand now that your idea is to combine paths and patterns.
In my opinion the 'allow-prefixes' option would default to false in order to set complete paths by default.
I believe this capability will give to application inspector more power to detect more cases. Great work and great tool.
Sorry, I thought you were proposing something excluding the patterns part and that's why I didn't see how to set the value in the expression. I understand now that your idea is to combine paths and patterns.
Got it. The idea here is that pattern functionality will not change - we are just changing the target on which the patterns are run - if the path
field is populated.
In my opinion the 'allow-prefixes' option would default to false in order to set complete paths by default.
That makes sense to me.
I believe this capability will give to application inspector more power to detect more cases. Great work and great tool.
Thanks for the great suggestion. I think this will add a lot of flexibility.
I think the suggestion for xpath compatibility makes sense as well. although that would be restricted to xml only. I wasn't able to find an equivalent query syntax for json or yml.
Reference for later: https://en.wikipedia.org/wiki/XPath
It looks like can be used as a character in a yml tag so will need to use a different separator.
Additionally need to consider how to query lists in json/yml.
Maybe convert json/yml to xml and then run xpath queries on it? That would make it difficult to provide the correct line number for the original file however.
I think the suggestion for xpath compatibility makes sense as well. although that would be restricted to xml only. I wasn't able to find an equivalent query syntax for json or yml.
Reference for later: https://en.wikipedia.org/wiki/XPath
It looks like can be used as a character in a yml tag so will need to use a different separator.
Additionally need to consider how to query lists in json/yml.
Maybe convert json/yml to xml and then run xpath queries on it? That would make it difficult to provide the correct line number for the original file however.
Equivalent to xpath are (I don't know any libraries in #C)
Perhaps it makes sense to do this in different phases and in different tasks? In my opinion having a simple key search (by path) covers many scenarios.
I was able to find a json path library for C#.
https://github.com/danielaparker/JsonCons.Net
On Mon, Jan 24, 2022 at 6:44 AM, Alberto Valiña Lema @.***> wrote:
I think the suggestion for xpath compatibility makes sense as well. although that would be restricted to xml only. I wasn't able to find an equivalent query syntax for json or yml.
Reference for later: https://en.wikipedia.org/wiki/XPath
It looks like can be used as a character in a yml tag so will need to use a different separator.
Additionally need to consider how to query lists in json/yml.
Maybe convert json/yml to xml and then run xpath queries on it? That would make it difficult to provide the correct line number for the original file however.
Equivalent to xpath are (I don't know any libraries in #C)
- json-path: https://goessner.net/articles/JsonPath/, https://restfulapi.net/json-jsonpath/
- yaml-path: https://github.com/wwkimball/yamlpath#:~:text=A%20YAML%20Path%20segment%20is,be%3A%20%2Fhash%2Fkey%20.
Perhaps it makes sense to do this in different phases and in different tasks? In my opinion having a simple key search (by path) covers many scenarios.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
I have started an implementation of this and realized the path needs to be inside the pattern in case you have a path and condition on different elements.
{
"name": "My Feature",
"id": "MYID00001",
"description": "Detects if My Feature is enabled",
"applies_to": [
"json"
],
"tags": [
"Features.MyFeature"
],
"severity": "Moderate",
"patterns": [
{
"confidence": "High",
"pattern": "true",
"type": "string"
"paths": [
{
"pattern": "app.myfeature.enabled",
// Require the first specified path component to be at the root of the document. For example:
// app:
// myfeature:
// enabled:
"allow-prefixes": false
},
{
"pattern": "app.myfeature.enabled",
// Allows arbitrary prefixes before the first specified path component, for example:
// parent:
// app:
// myfeature:
// enabled:
"allow-prefixes": true
}
],
}
}
Hi, Any plans to do this? Thank you very much
It has been partially implemented but there have been higher priority items so I have not been able to finish it.
I do plan to add this but at this time I cannot provide a date it will be done by.
Don't worry, I just wanted to know if this was still on. I'm sorry I can't really contribute as I have no knowledge of . Net 😓
Hi @gfs any news on this issue? It would be really useful
Thanks for the reminder. I may be able to squeeze this into 1.6 that I'm currently working on.
I revisited this and rediscovered the issue I had hit before. The issue is that once it is parsed to JSON/XML etc we lose tracking of location in the file where the match is. For example to extract the value at a specific location in an XML document you can do something like this, but the NodeIter and the elements it iterates do not provide the offset in the original file that they were derived from.
XPathDocument? xmlDoc;
try
{
xmlDoc = new XPathDocument(new StringReader(FullContent));
DocType = StructuredDocType.Xml;
}
catch (Exception)
{
xmlDoc = null;
}
if (xmlDoc is not null)
{
var navigator = xmlDoc.CreateNavigator();
var nodeIter = navigator.Select(Path);
while (nodeIter.MoveNext())
{
if (nodeIter.Current is not null)
{
yield return (nodeIter.Current.Value, null);
}
}
}
I made some progress with an experimental XML implementation which searches the document for the xml node found to get the location. JSON has been less successful, I tried both JsonCons and JsonEverything but neither provide a Parent element for a secondary search or an index (though it seems that the index is a private field of the JsonElement in System.Text.Json, unfortunately there's no way to access that due to protection level).
If you can find edge cases after the beta with the functionality is released that don't work that would be helpful.
To use, in the SearchPattern portion of a rule add a "xmlpath" or "jsonpath".
Here are the samples from the test cases and the data they match.
private const string jsonRule = @"[
{
""id"": ""SA000005"",
""name"": ""Testing.Rules.JSON"",
""tags"": [
""Testing.Rules.JSON""
],
""severity"": ""Critical"",
""description"": ""This rule finds books from the JSON titled with Sheep."",
""patterns"": [
{
""pattern"": ""Sheep"",
""type"": ""regex"",
""confidence"": ""High"",
""scopes"": [
""code""
],
""jsonpath"" : ""$.books[*].title""
}
],
""_comment"": """"
}
]";
private const string xmlRule = @"[
{
""id"": ""SA000005"",
""name"": ""Testing.Rules.XML"",
""tags"": [
""Testing.Rules.XML""
],
""severity"": ""Critical"",
""description"": ""This rule finds books from the XML titled with Franklin."",
""patterns"": [
{
""pattern"": ""Franklin"",
""type"": ""regex"",
""confidence"": ""High"",
""scopes"": [
""code""
],
""xpath"" : ""/bookstore/book/title""
}
],
""_comment"": """"
}
]";
private const string jsonData =
@"{
""books"":
[
{
""category"": ""fiction"",
""title"" : ""A Wild Sheep Chase"",
""author"" : ""Haruki Murakami"",
""price"" : 22.72
},
{
""category"": ""fiction"",
""title"" : ""The Night Watch"",
""author"" : ""Sergei Lukyanenko"",
""price"" : 23.58
},
{
""category"": ""fiction"",
""title"" : ""The Comedians"",
""author"" : ""Graham Greene"",
""price"" : 21.99
},
{
""category"": ""memoir"",
""title"" : ""The Night Watch"",
""author"" : ""David Atlee Phillips"",
""price"" : 260.90
}
]
}
";
private const string xmlData =
@"<?xml version=""1.0"" encoding=""utf-8"" ?>
<bookstore>
<book genre=""autobiography"" publicationdate=""1981-03-22"" ISBN=""1-861003-11-0"">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre=""novel"" publicationdate=""1967-11-17"" ISBN=""0-201-63361-2"">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
<book genre=""philosophy"" publicationdate=""1991-02-15"" ISBN=""1-861001-57-6"">
<title>The Gorgias</title>
<author>
<name>Plato</name>
</author>
<price>9.99</price>
</book>
</bookstore>
";
YML is blocked by https://github.com/aaubry/YamlDotNet/issues/333 unless you know of an alternate YAML parser for .NET which supports YamlPath functionality.
I just merged this for XML and JSON if you can give it a try, the new 1.6-beta with the functionality should be published shortly and I'd appreciate any feedback before finalizing the interfaces/calling the release stable. If you can provide any samples of XML/JSON + Rule combos that don't work as you expect that would be very helpful.
@jaimebp @valinha
Rereading the thread I see there was a request for the paths to be a list. I'll have a revised version shortly with that
I did not go with a unified query type. You can instead use standard JsonPath for JSON and standard xpath for XML. I would recommend limiting use of this to files of the appropriate type using applies_to - it will attempt to parse each file as a JSON file or XML which matches the applies_to (or the regex version) filter - there is no additional hidden filtering - and performing that operation for many files may cause high overhead - it will, however, only be done once for each file.
Sample Rule:
[
{
"id": "SA000005",
"name": "Testing.Rules.JSONandXML",
"tags": [
"Testing.Rules.JSON.JSONandXML"
],
"severity": "Critical",
"description": "This rule finds books titled with Franklin located either at the specified JSONPath in JSON or the specified xpath in XML files.",
"patterns": [
{
"pattern": "Franklin",
"type": "regex",
"confidence": "High",
"scopes": [
"code"
],
"jsonpaths" : ["$.books[*].title"],
"xpaths" : ["/bookstore/book/title"]
}
],
"_comment": ""
}
]
This matches these sample files:
{
"books":
[
{
"category": "fiction",
"title" : "A Wild Sheep Chase",
"author" : "Haruki Murakami",
"price" : 22.72
},
{
"category": "fiction",
"title" : "The Night Watch",
"author" : "Sergei Lukyanenko",
"price" : 23.58
},
{
"category": "fiction",
"title" : "The Comedians",
"author" : "Graham Greene",
"price" : 21.99
},
{
"category": "memoir",
"title" : "The Night Watch",
"author" : "David Atlee Phillips",
"price" : 260.90
},
{
"category": "memoir",
"title" : "The Autobiography of Benjamin Franklin",
"author" : "Benjamin Franklin",
"price" : 123.45
}
]
}
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography" publicationdate="1981-03-22" ISBN="1-861003-11-0">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel" publicationdate="1967-11-17" ISBN="0-201-63361-2">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
<book genre="philosophy" publicationdate="1991-02-15" ISBN="1-861001-57-6">
<title>The Gorgias</title>
<author>
<name>Plato</name>
</author>
<price>9.99</price>
</book>
</bookstore>
I just merged this for XML and JSON if you can give it a try, the new 1.6-beta with the functionality should be published shortly and I'd appreciate any feedback before finalizing the interfaces/calling the release stable. If you can provide any samples of XML/JSON + Rule combos that don't work as you expect that would be very helpful.
Of course, I will test the functionality today and give you feedback. Thank you very much.
I just tried xpath and it didn't work: Rule:
{
"name": "Source code: Java 17",
"id": "CODEJAVA000000",
"description": "Java 17 maven configuration",
"applies_to": [
"pom.xml"
],
"tags": [
"Code.Java.17"
],
"severity": "critical",
"patterns": [
{
"pattern": "17",
"xpaths" : ["/project/properties/java.version"],
"type": "regex",
"scopes": [
"code"
],
"modifiers": [
"i"
],
"confidence": "high"
}
]
}
Xml:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>xxx</groupId>
<artifactId>xxx</artifactId>
<version>0.1.0-SNAPSHOT</version>
<packaging>pom</packaging>
<name>${project.groupId}:${project.artifactId}</name>
<description />
<properties>
<java.version>17</java.version>
</properties>
</project>
Thanks for the feedback. I'll check this today.
I found two issues with your example.
I think this alternate way should work for the Xpath expression: /project/properties/*[name(.) = 'java.version']
However, I'm having trouble getting any xpath query to work with your sample xml. Will continue to investigate to see if I can resolve.
After a bit more testing the above modified query does work - however, it only works when I remove the attributes from the root element. It's not clear to me why this is the case yet, or how I can work around it.
So this rule:
{
"name": "Source code: Java 17",
"id": "CODEJAVA000000",
"description": "Java 17 maven configuration",
"applies_to_file_regex": [
"pom.xml"
],
"tags": [
"Code.Java.17"
],
"severity": "critical",
"patterns": [
{
"pattern": "17",
"xpaths" : ["/project/properties/*[name(.)='java.version']"],
"type": "regex",
"scopes": [
"code"
],
"modifiers": [
"i"
],
"confidence": "high"
}
]
}
Matches:
<?xml version="1.0" encoding="UTF-8"?>
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>xxx</groupId>
<artifactId>xxx</artifactId>
<version>0.1.0-SNAPSHOT</version>
<packaging>pom</packaging>
<name>${project.groupId}:${project.artifactId}</name>
<description />
<properties>
<java.version>17</java.version>
</properties>
</project>
but not
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>xxx</groupId>
<artifactId>xxx</artifactId>
<version>0.1.0-SNAPSHOT</version>
<packaging>pom</packaging>
<name>${project.groupId}:${project.artifactId}</name>
<description />
<properties>
<java.version>17</java.version>
</properties>
</project>
I opened a new issue (#497) to cover adding support for xml docs with namespace specified and will continue tracking this issue there.
I can successfully query your sample now with a small modification to the xpath. Because the xml has a namespace querying with just the local name doesn't work you need to specify the namespace too - or alternately you can specify it with the 'local-name' XPath method. See #499 for the modified query.
- Applies to is for languages - pom.xml is not a language by default so you'd need to provide custom languages. If you are already doing that, then this isn't an issue. You could instead use applies_to_file_regex with pom.xml if you don't want to provide custom languages.
In all the rules we use I have always considered pom.xml as the language as indicated in the doc and it has always worked fine:
https://github.com/microsoft/ApplicationInspector/wiki/3.4-Applies_to-(languages)#language-support
That is odd, Pom is listed as a language there but I wasn't seeing it working with applies to. Ill double check that as well.
Double checked by updating the test rule I created for #499 and you are correct, it also works with pom.xml as applies to.
Is your feature request related to a problem? Please describe. In order to be able to detect feature usage in structured configuration files (yaml, json, xml...) it would be very useful to be able to search for occurrences of certain entries regardless of the order of declaration in the file.
An example would be:
If we have a config.yaml file and we have the following configuration:
If I want to detect with a rule if the 'myfeature' functionality is enabled I would not have a reliable solution at the moment. Detecting if app.my-feature.enabled = true with patterns, modiffiers or conditions does not cover all cases, e.g. if the declaration order is changed or the configuration is put in one line. I.e. a pattern such as:
"pattern": "app:\s+myfeature:\s+enabled: *true".
It would not cover the above example since the order of the enabled property can be changed as it is a structured data type.
Describe the solution you'd like Ability to be able to perform patterns on structured data types.
Describe alternatives you've considered