kislyuk / yq

Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents
https://kislyuk.github.io/yq/
Apache License 2.0
2.53k stars 81 forks source link

Parsing nmap's XML: single script node into object, multiple script nodes into array #158

Closed joaociocca closed 1 year ago

joaociocca commented 1 year ago

When I try to parse an nmap XML result file through xq I'm having one difficulty. I'll start with the example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE nmaprun>
<?xml-stylesheet href="file:///usr/bin/../share/nmap/nmap.xsl" type="text/xsl"?>
<!-- Nmap 7.93 scan initiated Tue Jan 17 09:24:34 2023 as: nmap -p 80,443 -sTV -sC -O target -->
<nmaprun scanner="nmap" args="nmap -p 80,443 -sTV -sC -O target" start="1673958274" startstr="Tue Jan 17 09:24:34 2023" version="7.93" xmloutputversion="1.05">
    <scaninfo type="connect" protocol="tcp" numservices="2" services="80,443"/>
    <verbose level="0"/>
    <debugging level="0"/>
    <hosthint>
        <status state="up" reason="unknown-response" reason_ttl="0"/>
        <address addr="target_ip" addrtype="ipv4"/>
        <hostnames>
        <hostname name="target" type="user"/>
        </hostnames>
    </hosthint>
    <host starttime="1673958286" endtime="1673958304">
        <status state="up" reason="syn-ack" reason_ttl="48"/>
        <address addr="target_ip" addrtype="ipv4"/>
        <hostnames>
            <hostname name="target" type="user"/>
        </hostnames>
        <ports>
            <port protocol="tcp" portid="80">
                <state state="open" reason="syn-ack" reason_ttl="0"/>
                <service name="http" product="nginx" method="probed" conf="10">
                    <cpe>cpe:/a:igor_sysoev:nginx</cpe>
                </service>
                <script id="http-title" output="Did not follow redirect to https://target/">
                    <elem key="redirect_url">https://target/</elem>
                </script>
            </port>
            <port protocol="tcp" portid="443">
                <state state="open" reason="syn-ack" reason_ttl="0"/>
                <service name="http" product="nginx" tunnel="ssl" method="probed" conf="10">
                    <cpe>cpe:/a:igor_sysoev:nginx</cpe>
                </service>
                <script id="http-cookie-flags" output="&#xa;  /: &#xa;    ASPSESSIONIDQQACCTCC: &#xa;      httponly flag not set">
                    <table key="/">
                        <table key="ASPSESSIONIDQQACCTCC">
                            <elem>httponly flag not set</elem>
                        </table>
                    </table>
                </script>
                <script id="http-title" output="target title&#xa;Requested resource was target/path">
                    <elem key="title">target title</elem>
                    <elem key="redirect_url">target/path</elem>
                </script>
                <script id="tls-nextprotoneg" output="&#xa;  http/1.1">
                    <elem>http/1.1</elem>
                </script>
            </port>
        </ports>
    </host>
</nmaprun>

I can get to the port[]´ level just fine withxq -r '.nmaprun.host.ports.port[]', but then xq will make the.scriptfromport[0]an object, and the.scriptfromport[1]` an array.

xq -r '.nmaprun.host.ports.port[].script | type' output
object
array

My main goal here is to check if .script contains an elem with @key value of redirect_url, which I know how to do... but I can't seem to correctly access when .script can be an object or an array.

I could go the long way (since this will be a bash script) and first identify if there is just a single type of .script or two, and make different xq calls accordingly, but I was wondering if there's a way to deal with this straight on a jq query level.

--edit

oh gods, I just realized the same happens with .elem. They can be an object if there's a single <elem> inside <script> or an array if there are multiple O.o

--edit2

JFC elem can also be a string in case of <elem>text</elem>... this is a nightmare O.o

kislyuk commented 1 year ago

You're looking for the force_list parameter in xmltodict (https://github.com/martinblech/xmltodict/blob/fe3a37571e3867845b29e3c14ccc88186cbe379d/xmltodict.py#L270-L276). yq does not currently provide direct access to this parameter, but it seems like there's a legitimate use for it so we can look into adding it.

joaociocca commented 1 year ago

is there a workaround I can do in the mean time?

kislyuk commented 1 year ago

No.

kislyuk commented 1 year ago

Turns out I had already integrated force_list but just forgot about it. You can use xq --xml-force-list ELEMENT_NAME to accomplish this.