Convert manufacturer.txt to python dictionary #33

bucciarellialessandro commented 9 years ago

first idea of implementation

bucciarellialessandro commented 9 years ago

I created this PR in order to discuss together on the approach to use. This is not intended to be the final version. As it is, this feature gives the possibility to create a manufacturer.py file from manufacturer.txt. The .py file stores a dictionary containing the key/value pairs of MAC addresses prefix and related manufactuerers. Importing this file the time consuming activity of parsing the file looking for the manufacturer can be avoided.

nemesifier commented 9 years ago

Great!

I've taken a look at the python dictionary, there are a few improvements that can be done:

keys can be stripped of spaces at the beginning and end of the string
keys should be stripped of every character that is not a number, I explain later why
the order of the keys can be retained by adding sort_keys=True to json.dumps
it's probably better to call the file manufacturers.py (the s is missing)

As a further improvement, I was thinking that it would be very handy and it would save us a lot of time in the future if the txt file could be downloaded directly from the internet with an http request, so we can avoid storing it in the repository. Do you think you can do it?

Regarding how to call the code in the library, we could write a simple function in netengine.utils that takes a mac address (accepting both entire or partials) as input and returns the manufacturer name. The function should be able to handle different formats of mac addresses:

aa:00:bb:cc:dd:ee
AA:00:BB:CC:DD:EE
AA-BB-CC-DD-EE-FF
aabbccddeeff
even partial mac addresses (just the prefix): aa:bb:cc

If you agree with me we can do these things in 3 separate steps, I can write the final function.

bucciarellialessandro commented 9 years ago

I mostly agree with you on all points. You're right about the improvements on the json dictionary, as soon as possibile I will trim the white spaces and I will add the sorting on the dictionary keys. Where could be better to store the .txt file? On github itself and hardcode an URL in the code?

As regards to dictionary lookup you're right, I think it would be awesome if we store in the dict the keys as "AABBCCDDFF" and make some operation on the string to be compared. The partial MAC address comparison can be achieved by trimming all but the first three bytes.

So could be an idea of an operational workflow: 1) receive the MAC string to look for in the dict 2) check against the presence of ":" or "-" 3) trim all but the first three bytes (from MAC[0] to MAC[7]) which is exactly the prefix 4) replace ":" or "-" with "" 4a) perform, if necessary, a call to the str upper function on MAC 5) dictionary lookup 6) return the matching manufacturer

What do you think? :)

nemesifier commented 9 years ago

If the URL of the txt file is stable over time we can hardcode it in parser.py and let the script take care of downloading it and we can avoid storing it in the repository.

The workflow you described is correct, although you can simplify by doing just 4 operations:

strip anything that is not a number: http://stackoverflow.com/questions/1249388/how-do-we-remove-all-non-numeric-characters-from-a-string-in-python
take only the first 6 digits
convert to uppercase
return value from key

bucciarellialessandro commented 9 years ago

Why first point? Letters are valid char for a MAC address, aren't they? For example AA:AA:AA(:AA:AA:AA)

nemesifier commented 9 years ago

Yes sorry strip anything that is not an alphanumeric character [a-zA-Z0-9]

cl4u2 commented 9 years ago

MAC address is hexadecimal, so it would be better to strip anything that is not [a-fA-F0-9]

nemesifier commented 9 years ago

@cl4u2 :+1: for [a-fA-F0-9]

bucciarellialessandro commented 9 years ago

Hi guys, I don't agree in stripping all but [a-fA-F0-9]. I'm going to make the point clear. Image we receive a strange MAC address AA:AA:GG:GG:AA:AA, which is clearly wrong (it has G's). The strip on G's will produce AA:AA:AA:AA. After some manipulations on the MAC we will obtain AAAAAA, ready for the lookup into the manufacturer's dictionary. Sadly, this will produce a valid manufacturer because AAAAAA is in the dictionary, even though the initial mac address wasn't.

We could have introduced a bug. Do you all agree?

nemesifier commented 9 years ago

:+1:

openwisp / netengine

Convert manufacturer.txt to python dictionary #33 #35