hadiasghari / pyasn

Python IP address to Autonomous System Number lookup module. (Supports fast local lookups, and historical lookups using archived BGP dumps.)
Other
275 stars 71 forks source link

Question concerning License of PyASN's ancestors #25

Open dmth opened 8 years ago

dmth commented 8 years ago

Hi, I've got a Question concerning the License of PyASN's ancestors: Is there any code left from the mrtd-project which was licensed under 4-Clause BSD?

Because clause 3 would be interesting to manage:

    1. All advertising materials mentioning features or use of
  • this software must display the following acknowledgement: *
  • This product includes software developed by the University of
  • Michigan, Merit Network, Inc., and their contributors.

BR dmth

hadiasghari commented 8 years ago

Hi @dmth, a few updates on this:

I checked the specific file you mention (_radix.c), and talked with a FOSS licensing expert. The inclusion is fine in pyasn given its referenced in both source and documentation. But the 'advertising' clause is indeed restrictive and might cause issues for derivative projects. I further contacted the chain of authors of the code to see if they can release it with another license; it turns out that indeed it originates in MRT which I am next going to contact.

BR Hadi

dmth commented 8 years ago

Hadi, thank you very much for your efforts!

sebix commented 7 years ago

Is there any progress on this?

hadiasghari commented 7 years ago

Hi @dmth , @sebix . I've filed an open issue with the MRT project (see: https://github.com/deepfield/MRT/issues/1).

hadiasghari commented 7 years ago

Deepfield has not responded to the issue or email contact. So @dmth @sebix, shall we close this issue, try to find another way to contact them, or leave it be for now?

sebix commented 7 years ago

This is blocking any kind of bundling and/or shipping this software. If this is not solved, we are forced to switch the backend for certtools/intelmq, which I'd like to avoid if possible.

It therefore blocks #20

jkrauska commented 7 years ago

Without knowing anything about the code you're using -- are these viable alternatives without the same restrictions?

BSD https://github.com/freebsd/freebsd/blob/master/sys/net/radix.c

Apple Variant https://opensource.apple.com/source/xnu/xnu-1699.24.8/bsd/net/radix.c

I too would love to have a Debian package (#20).

sebix commented 7 years ago

First one should be no problem at all, then both the python and the c code have MIT.

Second uses [APSL-2.0]. This is incompatible with GPL. Tough it's problematic, linking it from code with other licences is allowed

jkrauska commented 7 years ago

Neglecting the licenses -- Does the MIT licensed code provide similar capabilities? I sadly admit that I only have a basic understanding of what a radix tree is -- and I haven't looked at how pyasn or MRTs use them at all.

sebix commented 7 years ago

I guess I have less knowledge on this topic than you have.

hadiasghari commented 7 years ago

The radix tree is a fundamental part of the module. It's what enables the longest prefix to be matched. Changing it is possible, but it would require much testing for both correctness and performance issues. So I guess we should try to reach-out to DeepField via other ways.

Btw, @sebix , the FOSS lawyer I talked with said that the advertising clause only holds if the particular functionality is your product's main functionality.

jkrauska commented 7 years ago

@sebix FWIW this python library https://github.com/YoshiyukiYamauchi/mrtparse can parse MRT datafiles (both gz and bz2) and DOES NOT use Merit's radix code. It's pure python (no c), so it's much slower than pyasn, but it includes a mrt2bgpdump.py script which could be trimmed down to be pretty much the same as the text file you get from the convert tool.

Update: I spent a little time hacking this, and it's DOG slow on the routeviews table. (so many peers) and can result in HUGE bgpdump outputs. (I killed it after >10 min and had a 1.2GB text file)

hadiasghari commented 7 years ago

@jkrauska just for clarification, pyasn uses the radix tree for its .lookup() function, not for the parsing .

hadiasghari commented 7 years ago

Hi all @sebix @jkrauska @dmth @anoroozian, an update and question/poll.

I've contacted the radix.c author directly (in addition to previously contacting deepfield), and received absolutely no response. I'm not sure how to interpret a non-response, other than that the license is what it is.

About the next step: On a scale of 1-5 how much are you in favor of switching to another of radix implementations, or have other suggestions? And could you contribute to the coding or testing?

Thanks!

jkrauska commented 7 years ago

+5 switch -- the original author is probably on a beach somewhere...

Best to find/use implementations without any licensing concerns.

There seems to be an unencumbered one in the linux kernel.

https://github.com/torvalds/linux/blob/master/lib/radix-tree.c

dmth commented 7 years ago

@hadiasghari thank you very much for keeping this issue alive and spending so much time for solving this issue.

Did anyone reach out to https://github.com/mjschultz/py-radix ? Because he must have the same problem like us. Maybe we can combine efforts here.

Right now I cannot tell if switching to another radix implementation is the way to go.

Is the use of a radix-tree a requirement? Do other data-structures serve our needs, which would allow a similar speed?

A first approach to measure speed might be:

import pyasn
from timeit import default_timer as timer

testdata = '/tmp/latest-bview.dat'

asndb = None
print("Creating Radix-Tree")
start = timer()
asndb = pyasn.pyasn(testdata)
end = timer()
time_spent = end - start
end = start = None
print("Time spend for creating Data-Structure: %s seconds"% time_spent)

print("Testing 100 IP Lookups")
start = timer()
for i in range(100):
    look = ('192.%i.1.1'% i)
    asndb.lookup(look)
end = timer()
time_spent = end - start
end = start = None
print("Time spend for 100 Lookups: %s seconds"% time_spent)

print("Testing 100 Prefix Lookups")
start = timer()
for i in range(100):
    asndb.get_as_prefixes(i)
end = timer()
time_spent = end - start
end = start = None
print("Time spend for 100 Prefix-lookups: %s seconds"% time_spent)

Results:

Creating Radix-Tree
Time spend for creating Data-Structure: 0.27312132100087183 seconds
Testing 100 IP Lookups
Time spend for 100 Lookups: 0.00023630400028196163 seconds
Testing 100 Prefix Lookups
Time spend for 100 Prefix-lookups: 1.406222734000039 seconds
dmth commented 7 years ago

@jkrauska

There seems to be an unencumbered one in the linux kernel. https://github.com/torvalds/linux/blob/master/lib/radix-tree.c

Thanks for finding this one, but I'm unsure if adding GPL-code does serve the needs of all users of PyASN.

dmth commented 7 years ago

Alternative pure python implementations, found by @bernhardreiter:

anoroozian commented 7 years ago

Ok, joining this thread a bit late but here is my 2 cents:

It seems to me that with respect to this question we are discussing two separate things:

  1. Issue #20 - (Debian packages)
  2. Potential licensing problem of including pyasn alongside other software.

The latest version of Debian already includes the problematic code and (going by the advice of the FOSS license expert that @hadiasghari talked to) the code it is not a main feature of Debian, therefore no problem on the advertising clause. Pyasn itself has no problem either and correctly includes the acknowledgements. Even if it were packaged along side Debian it would not be a main feature. So following the same logic, with respect to issue #20, I think that we should be able to move on as far as my understanding goes. Of course I'm in no way a licensing expert. Does anybody agree with this or am I completely missing something here?

With respect to the second issue, which I think is the subject of the Hadi's question/poll, I would vote +3 on changing the radix tree code, simply because it is such a core part of pyasn. Unless we have a reliable and efficient replacement that we can test I would be leaning slightly towards not changing the core part. At the moment, I won't be able to help out because of different priorities, but just in case we actually go in that direction I'm in favor of the "linux kernel" alternative pointed out

https://github.com/hadiasghari/pyasn/issues/25#issuecomment-278499941

ghost commented 7 years ago

The latest version of Debian already includes the problematic code

Do you refer to python-radix?

anoroozian commented 7 years ago

The latest version of Debian already includes the problematic code

Do you refer to python-radix?

@wagner-certat Exactly

bernhardreiter commented 7 years ago

Hi, as far as I understood @wagner-certat is mainly interested in using pyasn for intelmq (so am I) and his packaging efforts are related to it. As intelmq is under GNU AGPL v>=3 the "original 4-clause BSD license" is incompatible with it, see https://www.gnu.org/licenses/license-list.en.html#OriginalBSD.

As this "Original BSD" license is considered Free Software, it can be shipped by Debian, still no GNU GPL or AGPL software products can be derived from it.

As cidrtree and pytricia both promise performance improvements over python-radix when being used for network addresses, I think looking into it would be a well spend effort. I expect the interface to the lookup part of pyasn to be small, which should make this feasable.

hadiasghari commented 7 years ago

The radix tree is a fundamental part of the module: it's what enables the most-specific-prefix match to happen when doing a lookup.

Changing it is possible, but it would require much testing. I originally picked a C implementation to ensure fast db loads and memory efficiency. I will compare the pure Python alternatives.

A new question for pytricia: is its GPL license compatible with pyasn's MIT license? Or would we want two branches if we pick it?

dmth commented 7 years ago

A new question for pytricia: is its GPL license compatible with pyasn's MIT license?

@bernhardreiter already asked the author to consider the licensing: https://github.com/jsommers/pytricia/issues/10 let's see what happens.

ghost commented 7 years ago

On 02/16/2017 08:45 AM, Bernhard E. Reiter wrote:

as far as I understood @wagner-certat https://github.com/wagner-certat is mainly interested in using pyasn for intelmq (so am I) and his packaging efforts are related to it. As intelmq is under GNU AGPL v>=3 the "original 4-clause BSD license" is incompatible with it, see https://www.gnu.org/licenses/license-list.en.html#OriginalBSD.

But I create separate packages for intelmq and pyasn.

bernhardreiter commented 7 years ago

But I create separate packages for intelmq and pyasn.

You can do this and help pyasn alone. The licensing problem with using it in intelmq is not solved by separate packaging. The legal question is whether the work 'intelmq product' is derived from 'pyasn' which probably is yes, because you cannot leave out pyasn for good.

hadiasghari commented 7 years ago

I have good news and a suggestion. I tested pytricia and the performance is on par with pyasn's radix tree.

My suggestion is to:

Will this work for everyone? IMO it allows having multiple licenses, and avoid breaking code in case of bugs in alternative backends.

bernhardreiter commented 7 years ago

@hadiasghari that is really good news! Thanks for testing pytricia!

As far as I can say this will work for us (@dmth and IntelMQ).

We would be using only pytricia and my suggestion for Debian packaging would be to go in this direction as well. It is fine to keep the ability to use the old code with the less wanted 4-clause BSD license in, as long as it does not pose a maintenance burden. (This is also the main criteria for possibly removing it, even small options would usually have to tested and thus add to the complexity.)

ghost commented 7 years ago

@hadiasghari Great news. Thanks for all your work!

@bernhardreiter

You can do this and help pyasn alone. The licensing problem with using it in intelmq is not solved by separate packaging. The legal question is whether the work 'intelmq product' is derived from 'pyasn' which probably is yes, because you cannot leave out pyasn for good.

The functionality is also provided by other tools, so pyasn is not the only option.

hadiasghari commented 7 years ago

@bernhardreiter,

We would be using only pytricia and my suggestion for Debian packaging would be to go in this direction as well. It is fine to keep the ability to use the old code with the less wanted 4-clause BSD license in, as long as it does not pose a maintenance burden. (This is also the main criteria for possibly removing it, even small options would usually have to tested and thus add to the complexity.)

I do not want to change the default backend for everyone, because I don't want anything to break if pytricia changes. Also, the load-time is slower, and I noticed a possible memory leak. Regarding the maintenance burden with two-backends, I don't think it'll be a big issue, as the radix-code is stable and doesn't change. To enable multiple radix-backends, I'll create a common abstract class that wraps their functionality.

bernhardreiter commented 7 years ago

@hadiasghari

I do not want to change the default backend for everyone [..] I'll create a common abstract class that wraps their functionality.

works for me. :+1:

I noticed a possible memory leak

It would be cool if you could report this to https://github.com/jsommers/pytricia @jsommers is certainly interested in such a report.

I do not want to change the default backend for everyone

Personally I would change the default or even remove the code that is less-desirably licensed even if it is a tiny bit slower because I think this licensing trouble will cause problems with some people later where the efforts to resolve it will have multiplied. If you would want to test pytricia before removal I'd still change the default to it to get more exposure and deprecate the old-code to remove it with some following release. (This is just me, it is your call, I'd just wanted to give you my take on it. :) )

Thanks for pyasn as Free Software!

ghost commented 3 years ago

Shouldn't the BSD-4-Clause license of the radix code be reflected in the license specification of pyasn? (for simplicity and as the plan is to go with pytricia anyway, I would only adapt https://github.com/hadiasghari/pyasn/blob/bebed27d768c76c6b4bddaff368578dc67f0ec13/setup.py#L41 to MIT and BSD-4-Clause)