Open dmth opened 8 years ago
Hi @dmth, a few updates on this:
I checked the specific file you mention (_radix.c), and talked with a FOSS licensing expert. The inclusion is fine in pyasn given its referenced in both source and documentation. But the 'advertising' clause is indeed restrictive and might cause issues for derivative projects. I further contacted the chain of authors of the code to see if they can release it with another license; it turns out that indeed it originates in MRT which I am next going to contact.
BR Hadi
Hadi, thank you very much for your efforts!
Is there any progress on this?
Hi @dmth , @sebix . I've filed an open issue with the MRT project (see: https://github.com/deepfield/MRT/issues/1).
Deepfield has not responded to the issue or email contact. So @dmth @sebix, shall we close this issue, try to find another way to contact them, or leave it be for now?
This is blocking any kind of bundling and/or shipping this software. If this is not solved, we are forced to switch the backend for certtools/intelmq, which I'd like to avoid if possible.
It therefore blocks #20
Without knowing anything about the code you're using -- are these viable alternatives without the same restrictions?
BSD https://github.com/freebsd/freebsd/blob/master/sys/net/radix.c
Apple Variant https://opensource.apple.com/source/xnu/xnu-1699.24.8/bsd/net/radix.c
I too would love to have a Debian package (#20).
First one should be no problem at all, then both the python and the c code have MIT.
Second uses [APSL-2.0]. This is incompatible with GPL. Tough it's problematic, linking it from code with other licences is allowed
Neglecting the licenses -- Does the MIT licensed code provide similar capabilities? I sadly admit that I only have a basic understanding of what a radix tree is -- and I haven't looked at how pyasn or MRTs use them at all.
I guess I have less knowledge on this topic than you have.
The radix tree is a fundamental part of the module. It's what enables the longest prefix to be matched. Changing it is possible, but it would require much testing for both correctness and performance issues. So I guess we should try to reach-out to DeepField via other ways.
Btw, @sebix , the FOSS lawyer I talked with said that the advertising clause only holds if the particular functionality is your product's main functionality.
@sebix FWIW this python library https://github.com/YoshiyukiYamauchi/mrtparse can parse MRT datafiles (both gz and bz2) and DOES NOT use Merit's radix code. It's pure python (no c), so it's much slower than pyasn, but it includes a mrt2bgpdump.py script which could be trimmed down to be pretty much the same as the text file you get from the convert tool.
Update: I spent a little time hacking this, and it's DOG slow on the routeviews table. (so many peers) and can result in HUGE bgpdump outputs. (I killed it after >10 min and had a 1.2GB text file)
@jkrauska just for clarification, pyasn uses the radix tree for its .lookup() function, not for the parsing .
Hi all @sebix @jkrauska @dmth @anoroozian, an update and question/poll.
I've contacted the radix.c author directly (in addition to previously contacting deepfield), and received absolutely no response. I'm not sure how to interpret a non-response, other than that the license is what it is.
About the next step: On a scale of 1-5 how much are you in favor of switching to another of radix implementations, or have other suggestions? And could you contribute to the coding or testing?
Thanks!
+5 switch -- the original author is probably on a beach somewhere...
Best to find/use implementations without any licensing concerns.
There seems to be an unencumbered one in the linux kernel.
https://github.com/torvalds/linux/blob/master/lib/radix-tree.c
@hadiasghari thank you very much for keeping this issue alive and spending so much time for solving this issue.
Did anyone reach out to https://github.com/mjschultz/py-radix ? Because he must have the same problem like us. Maybe we can combine efforts here.
Right now I cannot tell if switching to another radix implementation is the way to go.
Is the use of a radix-tree a requirement? Do other data-structures serve our needs, which would allow a similar speed?
A first approach to measure speed might be:
import pyasn
from timeit import default_timer as timer
testdata = '/tmp/latest-bview.dat'
asndb = None
print("Creating Radix-Tree")
start = timer()
asndb = pyasn.pyasn(testdata)
end = timer()
time_spent = end - start
end = start = None
print("Time spend for creating Data-Structure: %s seconds"% time_spent)
print("Testing 100 IP Lookups")
start = timer()
for i in range(100):
look = ('192.%i.1.1'% i)
asndb.lookup(look)
end = timer()
time_spent = end - start
end = start = None
print("Time spend for 100 Lookups: %s seconds"% time_spent)
print("Testing 100 Prefix Lookups")
start = timer()
for i in range(100):
asndb.get_as_prefixes(i)
end = timer()
time_spent = end - start
end = start = None
print("Time spend for 100 Prefix-lookups: %s seconds"% time_spent)
Results:
Creating Radix-Tree
Time spend for creating Data-Structure: 0.27312132100087183 seconds
Testing 100 IP Lookups
Time spend for 100 Lookups: 0.00023630400028196163 seconds
Testing 100 Prefix Lookups
Time spend for 100 Prefix-lookups: 1.406222734000039 seconds
@jkrauska
There seems to be an unencumbered one in the linux kernel. https://github.com/torvalds/linux/blob/master/lib/radix-tree.c
Thanks for finding this one, but I'm unsure if adding GPL-code does serve the needs of all users of PyASN.
Alternative pure python implementations, found by @bernhardreiter:
Ok, joining this thread a bit late but here is my 2 cents:
It seems to me that with respect to this question we are discussing two separate things:
The latest version of Debian already includes the problematic code and (going by the advice of the FOSS license expert that @hadiasghari talked to) the code it is not a main feature of Debian, therefore no problem on the advertising clause. Pyasn itself has no problem either and correctly includes the acknowledgements. Even if it were packaged along side Debian it would not be a main feature. So following the same logic, with respect to issue #20, I think that we should be able to move on as far as my understanding goes. Of course I'm in no way a licensing expert. Does anybody agree with this or am I completely missing something here?
With respect to the second issue, which I think is the subject of the Hadi's question/poll, I would vote +3 on changing the radix tree code, simply because it is such a core part of pyasn. Unless we have a reliable and efficient replacement that we can test I would be leaning slightly towards not changing the core part. At the moment, I won't be able to help out because of different priorities, but just in case we actually go in that direction I'm in favor of the "linux kernel" alternative pointed out
https://github.com/hadiasghari/pyasn/issues/25#issuecomment-278499941
The latest version of Debian already includes the problematic code
Do you refer to python-radix?
The latest version of Debian already includes the problematic code
Do you refer to python-radix?
@wagner-certat Exactly
Hi, as far as I understood @wagner-certat is mainly interested in using pyasn for intelmq (so am I) and his packaging efforts are related to it. As intelmq is under GNU AGPL v>=3 the "original 4-clause BSD license" is incompatible with it, see https://www.gnu.org/licenses/license-list.en.html#OriginalBSD.
As this "Original BSD" license is considered Free Software, it can be shipped by Debian, still no GNU GPL or AGPL software products can be derived from it.
As cidrtree and pytricia both promise performance improvements over python-radix when being used for network addresses, I think looking into it would be a well spend effort. I expect the interface to the lookup part of pyasn to be small, which should make this feasable.
The radix tree is a fundamental part of the module: it's what enables the most-specific-prefix match to happen when doing a lookup.
Changing it is possible, but it would require much testing. I originally picked a C implementation to ensure fast db loads and memory efficiency. I will compare the pure Python alternatives.
A new question for pytricia: is its GPL license compatible with pyasn's MIT license? Or would we want two branches if we pick it?
A new question for pytricia: is its GPL license compatible with pyasn's MIT license?
@bernhardreiter already asked the author to consider the licensing: https://github.com/jsommers/pytricia/issues/10 let's see what happens.
On 02/16/2017 08:45 AM, Bernhard E. Reiter wrote:
as far as I understood @wagner-certat https://github.com/wagner-certat is mainly interested in using pyasn for intelmq (so am I) and his packaging efforts are related to it. As intelmq is under GNU AGPL v>=3 the "original 4-clause BSD license" is incompatible with it, see https://www.gnu.org/licenses/license-list.en.html#OriginalBSD.
But I create separate packages for intelmq and pyasn.
But I create separate packages for intelmq and pyasn.
You can do this and help pyasn alone. The licensing problem with using it in intelmq is not solved by separate packaging. The legal question is whether the work 'intelmq product' is derived from 'pyasn' which probably is yes, because you cannot leave out pyasn for good.
I have good news and a suggestion. I tested pytricia
and the performance is on par with pyasn
's radix tree.
My suggestion is to:
setup
to allow building without radix.c
backend
parameter to pyasn.__init__
that selects the desired radix backend; if None, it is set automaticallyWill this work for everyone? IMO it allows having multiple licenses, and avoid breaking code in case of bugs in alternative backends.
@hadiasghari that is really good news! Thanks for testing pytricia!
As far as I can say this will work for us (@dmth and IntelMQ).
We would be using only pytricia and my suggestion for Debian packaging would be to go in this direction as well. It is fine to keep the ability to use the old code with the less wanted 4-clause BSD license in, as long as it does not pose a maintenance burden. (This is also the main criteria for possibly removing it, even small options would usually have to tested and thus add to the complexity.)
@hadiasghari Great news. Thanks for all your work!
@bernhardreiter
You can do this and help pyasn alone. The licensing problem with using it in intelmq is not solved by separate packaging. The legal question is whether the work 'intelmq product' is derived from 'pyasn' which probably is yes, because you cannot leave out pyasn for good.
The functionality is also provided by other tools, so pyasn is not the only option.
@bernhardreiter,
We would be using only pytricia and my suggestion for Debian packaging would be to go in this direction as well. It is fine to keep the ability to use the old code with the less wanted 4-clause BSD license in, as long as it does not pose a maintenance burden. (This is also the main criteria for possibly removing it, even small options would usually have to tested and thus add to the complexity.)
I do not want to change the default backend for everyone, because I don't want anything to break if pytricia
changes. Also, the load-time is slower, and I noticed a possible memory leak. Regarding the maintenance burden with two-backends, I don't think it'll be a big issue, as the radix-code is stable and doesn't change. To enable multiple radix-backends, I'll create a common abstract class that wraps their functionality.
@hadiasghari
I do not want to change the default backend for everyone [..] I'll create a common abstract class that wraps their functionality.
works for me. :+1:
I noticed a possible memory leak
It would be cool if you could report this to https://github.com/jsommers/pytricia @jsommers is certainly interested in such a report.
I do not want to change the default backend for everyone
Personally I would change the default or even remove the code that is less-desirably licensed even if it is a tiny bit slower because I think this licensing trouble will cause problems with some people later where the efforts to resolve it will have multiplied. If you would want to test pytricia before removal I'd still change the default to it to get more exposure and deprecate the old-code to remove it with some following release. (This is just me, it is your call, I'd just wanted to give you my take on it. :) )
Thanks for pyasn as Free Software!
Shouldn't the BSD-4-Clause license of the radix code be reflected in the license specification of pyasn? (for simplicity and as the plan is to go with pytricia anyway, I would only adapt https://github.com/hadiasghari/pyasn/blob/bebed27d768c76c6b4bddaff368578dc67f0ec13/setup.py#L41 to MIT and BSD-4-Clause
)
Hi, I've got a Question concerning the License of PyASN's ancestors: Is there any code left from the mrtd-project which was licensed under 4-Clause BSD?
Because clause 3 would be interesting to manage:
BR dmth