nilp0inter / cpe

CPE: Common Platform Enumeration for Python
GNU Lesser General Public License v3.0
92 stars 30 forks source link

CPE 2.3 FS parsing error #23

Closed damiengermonville closed 9 years ago

damiengermonville commented 10 years ago

Hi,

I'm dealing with an error when parsing an FS taken from the official NVD CPE dictionary: http://static.nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml

It seems like there is an issue with the regular expression(s) used there: https://github.com/nilp0inter/cpe/blob/develop/cpe/cpe2_3_fs.py#L85

The error occurs when parsing FS. If the FS contains an escaped colon ( : ), the regex fails to grab each part correctly. It seems that the pattern [^:]+ doesn't behave as intended.

How to reproduce:

>>> from cpe import CPE
>>> c = CPE('cpe:2.3:a:canonical:update-manager:1\:0.134.11.1:*:*:*:*:*:*:*')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/cpe/cpe.py", line 311, in __new__
    raise NotImplementedError(errmsg)
NotImplementedError: Version of CPE not implemented
>>>

Parsing CPE 2.2 names works like a charm and colon is escaped as it should:

>>> c = CPE('cpe:/a:canonical:update-manager:1%3a0.134.11.1')
>>> print c.as_fs()
cpe:2.3:a:canonical:update-manager:1\:0.134.11.1:*:*:*:*:*:*:*

However, when taking it back to a 2.3 name:

>>> CPE(c.as_fs())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/cpe/cpe.py", line 311, in __new__
    raise NotImplementedError(errmsg)
NotImplementedError: Version of CPE not implemented
>>> 

If you have any lead on how to fix this by just modifying the regex it would be cool :). Otherwise it means partially refactoring FS parser, well, I guess.

Thanks !

nilp0inter commented 10 years ago

Hi,

good catch ;)

I think that all the parsers must be redone from scratch, but I haven't any time so far.

Can you please test the above patch (branch 2_3_fs_parsing_error) and give me some feedback?

Thank you for reporting this!

damiengermonville commented 10 years ago

Hi !

Thanks, the fix works just fine ! I've been able to parse about 90k CPE names without any errors. I assume that the issue is fixed.

I just had one last question about getters (CPE.get_*(), i.e. get_version()). Shouldn't these functions remove backslash char when it precedes a colon before returning the parts ?

In my example, get_version() will return "1:0.134.11.1" but i'm not sure that's the expected behavior. That is no big deal anyway as it can processed later, like, very very easily : )

(Just giving python example for some clarity)

>>> import cpe
>>> c = cpe.CPE('cpe:2.3:a:canonical:update-manager:1\:0.134.11.1:*:*:*:*:*:*:*')
>>> print c.get_version()[0]
1\:0.134.11.1

Thanks again for that quick and functionnal fix !

nilp0inter commented 10 years ago

Uhhm, good point.

I think that the CPE object should remove the slash at parsing time and store internally the actual component data instead of any representation details.

I'm going to make a ticket for this and add it to the TODO of the next milestone.

Thank you!