libo26 / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

repeated namespace entities result in the only the last one registering #273

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. parse the XML feed from 
http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-recent.xml, assign to 'p'
2. examine p['vuln_product']
3. only the last item will be recorded

example snippet:
    <vuln:vulnerable-software-list>
      <vuln:product>cpe:/a:oracle:peoplesoft_enterprise:9.0:bundle7</vuln:product>
      <vuln:product>cpe:/a:oracle:peoplesoft_enterprise:9.1:bundle4</vuln:product>
      <vuln:product>cpe:/a:oracle:peoplesoft_enterprise:8.9:bundle7</vuln:product>
      <vuln:product>cpe:/a:oracle:peoplesoft_enterprise:8.8:bundle13</vuln:product>
    </vuln:vulnerable-software-list>

What is the expected output? What do you see instead?
a list of 4 items long reflecting each affected version.  i find only the last 
one:
[...  'vuln_product': 'cpe:/a:oracle:peoplesoft_enterprise:8.8:bundle13'  ...]

What version of the product are you using? On what operating system?
5.0.1, Linux

Please provide any additional information below.

i don't have a quick fix for this one this time ;)  stab in the dark - if 
multiples show up, make the dictionary value a list. as in 'vuln_product': 
list(1st, 2nd, ... last)

Original issue reported on code.google.com by firefigh...@gmail.com on 27 Apr 2011 at 3:29

GoogleCodeExporter commented 9 years ago
The file you've linked to isn't a feed. You'll have to write a parser to deal 
with its contents. :( You can definitely use feedparser as starting point, 
since all of the boiler plate parser code is basically already in place for it.

As a quick pointer, the way feedparser is currently designed you would have to 
create a method named `_start_vuln_product()` and another named 
`_end_vuln_product()`. You can model the actual method code after 
similarly-named methods elsewhere in the code; I think they're all in the 
`_FeedParserMixin` class.

Original comment by kurtmckee on 27 Apr 2011 at 5:45

GoogleCodeExporter commented 9 years ago
ok, i've been working on this.  i'm curious as to how most of the entries in
this get picked up but some of them do not, and how the naming goes for
those that do.  can you elucidate?  i'm new to parsing feeds and fairly
naive with regard to xml as well.  this highly detailed non-rss form of feed
file is updated frequently as new vulnerabilities are recorded, information
i show below may have already expired.

i.e., here's the output of a parsed section:

cpe-lang_fact-ref                ==> {'name':
['cpe:/a:oracle:peoplesoft_enterprise:9.1:bundle4',
'cpe:/a:oracle:peoplesoft_enterprise:9.1:bundle4']}
cpe-lang_logical-test            ==> {'operator': ['OR', 'OR'], 'negate':
['false', 'false']}
cvss_access-complexity           ==> MEDIUM
cvss_access-vector               ==> NETWORK
cvss_authentication              ==> SINGLE_INSTANCE
cvss_availability-impact         ==> NONE
cvss_base_metrics                ==>
cvss_confidentiality-impact      ==> NONE
cvss_generated-on-datetime       ==> 2011-04-20T13:40:00.000-04:00
cvss_integrity-impact            ==> PARTIAL
cvss_score                       ==> 3.5
cvss_source                      ==> http://nvd.nist.gov
vuln_cve-id                      ==> CVE-2011-0826
vuln_cvss                        ==>
vuln_last-modified-datetime      ==> 2011-04-20T00:00:00.000-04:00
vuln_product                     ==>
cpe:/a:oracle:peoplesoft_enterprise:8.8:bundle13
vuln_published-datetime          ==> 2011-04-20T06:55:01.497-04:00
vuln_reference                   ==> {'href': ['
http://www.oracle.com/technetwork/topics/security/cpuapr2011-301950.html', '
http://www.oracle.com/technetwork/topics/security/cpuapr2011-301950.html'],
'xml:lang': ['en', 'en']}
vuln_references                  ==> {'xml:lang': ['en', 'en'],
'reference_type': ['VENDOR_ADVISORY', 'VENDOR_ADVISORY']}
vuln_source                      ==> CONFIRM
vuln_summary                     ==> Unspecified vulnerability in Oracle
PeopleSoft Enterprise 8.8 Bundle #13, 8.9 Bundle #7, 9.0 Bundle #7, and 9.1
Bundle #4 allows remote authenticated users to affect integrity via unknown
vectors related to Application Portal.
vuln_vulnerable-configuration    ==> {'id': ['http://nvd.nist.gov/', '
http://nvd.nist.gov/']}
vuln_vulnerable-software-list    ==>

the above is a section in the static feed that refers to peoplesoft
(arbitrarily chosen). - here's the relevant python code extraction which
provides the above.

h2 = httplib2.Http('.rss-cache')
response, content = h2.request(advisories[feed])
m = re.search('charset=([\w\d-]+)', response['content-type'])
if m and len(m.groups() > 0):
 hcharset = m.group(1)
else:
 hcharset = 'utf-8'

# ... non-200/30x error handling and transcoding if needed

xmlp = feedparser.parse(content)
if feed in ('nvd'):
 itemlist = xmlp.entries

for item in itemlist:
 if feed in ('nvd',):
   if 'PeopleSoft' in item['vuln_summary']:
     for x in sorted(item):
       print ('{0:32} ==> {1}'.format(x, item[x]))

what is confusing me, is how the other items are parsed, such as
cpe-lang_fact-ref (i've modified feedparser.py to make some lists of
things), but other items such as vuln_product are not.  here's the relevant
entry from the xml feed file.  what makes the cpe-lang:fact-ref get parsed
and the vuln:product not get parsed?  because cpe-lang:fact-ref has item
values v.s. vuln:product which has a text descendent? (and i need to figure
out why the bolded line is appended twice instead of all four lines being
appended to the list)

<?xml version='1.0' encoding='UTF-8'?>

<nvd xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.1"
xmlns:cpe-lang="http://cpe.mitre.org/language/2.0"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:cvss="
http://scap.nist.gov/schema/cvss-v2/0.2"
 xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4" xmlns="
http://scap.nist.gov/schema/feed/vulnerability/2.0"
 xmlns:patch="http://scap.nist.gov/schema/patch/0.1" xsi:schemaLocation="
http://scap.nist.gov/schema/patch/0.1
 http://nvd.nist.gov/schema/patch_0.1.xsd
http://scap.nist.gov/schema/scap-core/0.1
 http://nvd.nist.gov/schema/scap-core_0.1.xsd
http://scap.nist.gov/schema/feed/vulnerability/2.0
  http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd" nvd_xml_version="2.0"
pub_date="2011-04-26T16:00:00">

[...]

  <entry id="CVE-2011-0826">
    <vuln:vulnerable-configuration id="http://nvd.nist.gov/">
      <cpe-lang:logical-test negate="false" operator="OR">
        <cpe-lang:fact-ref
name="cpe:/a:oracle:peoplesoft_enterprise:8.8:bundle13" />
        <cpe-lang:fact-ref
name="cpe:/a:oracle:peoplesoft_enterprise:8.9:bundle7" />
        <cpe-lang:fact-ref
name="cpe:/a:oracle:peoplesoft_enterprise:9.0:bundle7" />
        *<cpe-lang:fact-ref
name="cpe:/a:oracle:peoplesoft_enterprise:9.1:bundle4" />*
      </cpe-lang:logical-test>
    </vuln:vulnerable-configuration>
    <vuln:vulnerable-software-list>

<vuln:product>cpe:/a:oracle:peoplesoft_enterprise:9.0:bundle7</vuln:product>

<vuln:product>cpe:/a:oracle:peoplesoft_enterprise:9.1:bundle4</vuln:product>

<vuln:product>cpe:/a:oracle:peoplesoft_enterprise:8.9:bundle7</vuln:product>

<vuln:product>cpex:/a:oracle:peoplesoft_enterprise:8.8:bundle13</vuln:product>
    </vuln:vulnerable-software-list>
    <vuln:cve-id>CVE-2011-0826</vuln:cve-id>

<vuln:published-datetime>2011-04-20T06:55:01.497-04:00</vuln:published-datetime>

<vuln:last-modified-datetime>2011-04-20T00:00:00.000-04:00</vuln:last-modified-d
atetime>
    <vuln:cvss>
      <cvss:base_metrics>
        <cvss:score>3.5</cvss:score>
        <cvss:access-vector>NETWORK</cvss:access-vector>
        <cvss:access-complexity>MEDIUM</cvss:access-complexity>
        <cvss:authentication>SINGLE_INSTANCE</cvss:authentication>
        <cvss:confidentiality-impact>NONE</cvss:confidentiality-impact>
        <cvss:integrity-impact>PARTIAL</cvss:integrity-impact>
        <cvss:availability-impact>NONE</cvss:availability-impact>
        <cvss:source>http://nvd.nist.gov</cvss:source>

<cvss:generated-on-datetime>2011-04-20T13:40:00.000-04:00</cvss:generated-on-dat
etime>
      </cvss:base_metrics>
    </vuln:cvss>
    <vuln:references xml:lang="en" reference_type="VENDOR_ADVISORY">
      <vuln:source>CONFIRM</vuln:source>
      <vuln:reference href="
http://www.oracle.com/technetwork/topics/security/cpuapr2011-301950.html"
xml:lang="en">
http://www.oracle.com/technetwork/topics/security/cpuapr2011-301950.html
</vuln:reference>
    </vuln:references>
    <vuln:summary>Unspecified vulnerability in Oracle PeopleSoft Enterprise
8.8 Bundle #13, 8.9 Bundle #7, 9.0 Bundle #7, and 9.1 Bundle #4 allows
remote authenticated users to affect integrity via unknown vectors related
to Application Portal.</vuln:summary>
  </entry>

thank you,
-david

in

have

Original comment by firefigh...@gmail.com on 27 Apr 2011 at 6:57