hyperspy / rosettasciio

Python library for reading and writing scientific data format
https://hyperspy.org/rosettasciio
GNU General Public License v3.0
51 stars 28 forks source link

Permissive license or LGPL? #51

Open ericpre opened 2 years ago

ericpre commented 2 years ago

Now that the IO code of hyperspy has been split, it may worth considering re-licensing the code with a more permissive license in RosettaSciIO. I don't a strong view on this topic, but I start this discussion because it is coming up once in a while, and it seems that there is appetite for a more permissive license, as keeping the GPL license could prevent adoption by some libraries/softwares.

vasole commented 2 years ago

My preferred license for collaborative work is LGPL, particulary in the case of a library: You use it, but if you modify/improve it, you are forced to distribute the modifications.

However, that license was not appropriate for collaboration with some US scientists. Basically their caveats were that if they are paid by the US administration, the US administration must be able to use their code and that could be problematic (DoE, defense,...) even for the LGPL case.

Therefore we have adopted MIT in most projects (hdf5plugin, PyMca, silx, fabIO ...)

sem-geologist commented 2 years ago

I think LGPL also is very appropriate license

sem-geologist commented 2 years ago

There is other nice features of LGPL. Some of these works (like mine reverse engineering of formats) is to give freedom and save users or comrades-in-occupation (SEM and EPMA operators) from wasting unnecessarily time by thousand of mouse clicks normally required to achieve even basic functions... The usability for other interested parts (OEM, other software developers) is of secondary importance at least for me. Closing such library (or) parts is like undoing the effort of reverse engineering. So LGPL forces to leave such part open, and still gives the freedom to user to replace such library with newer version as LGPL allows only dynamic linking. So the User can use some proprietary software and use open source library with all benefits coming with it. We should remember that LGPL / GPL is to give user a freedom, not to empower some third part OEM. And LGPL I think is this kind of compromise to let it be useful with proprietary software and still maintain huge degree of freedoms for user.

BTW, I can't understand what's the problem of DoE defence... Somehow Linux being GPL (mostly RedHat) and running on their War machines going to Iraq was not at all a problem (At least I saw some articles there and here claiming that). What had changed in USA administration/military to get against GPL?

vasole commented 2 years ago

Please, do not misunderstand me. If I would have a vote, it would be for LGPL.

| What had changed in USA administration/military to get against GPL?

No idea. May be they do not want to get suited because of not distributing their code?

I work for an European research organization and our general software policy is to give access to our developments to the whole community and not only to the countries belonging to the organization.

francisco-dlp commented 2 years ago

I agree that LGPL is a more appropriate license for RosettaSciIO (and possibly for HyperSpy too). While it doesn't maximise the potential users of the library (for that we'll have to think on BSD or similar), it increases substantially its domain of application without compromising on the essential: any modifications to the library must be published under the same terms.

sem-geologist commented 2 years ago

One thing what most of different kind of end users does not understand (including mentioned administrations) that no one needs to distribute changes or switch its code base to GPL then using GPL parts - unless they are distributing it for third parties (selling, or giving away attached as a single product). So if DoD (or any other agency) needs own modifications - and for sure they are not going to use those outside from institutions - who ever is going to find out and sue them? The Law which can't be anyhow enforced is a dead law. In my view, Technically I see very little difference between commercial products requiring third part GPL libraries, LGPL libraries or comercial libraries. What is the difference between application which requires on MS Windows to get and install some .NET runtime compared to app on linux which for functioning needs to get PyQt5 or PySide6, gnuplot, libpng and so on... No difference. The difference is when distributing fully functioning products with all necessary libraries included - and this is where these Licenses are relevant. There are some purists which would want only GPL everything and try to bend the reality to make us believe that we can't mix different licenses (well developers can't, but that is not end users!).

sem-geologist commented 2 years ago

an example: pyqtgraph library is licensed under BSD. Wait a moment, but PyQt5 requires either GPL or commercial! Yeah, but pyqtgraph does not modify the PyQt5 and is not distributing the PyQt5 - which is left to get installed for end users or developers.

vasole commented 2 years ago

| - and for sure they are not going to use those outside from institutions -

They are selling/distributing things to other countries too.

| The Law which can't be anyhow enforced is a dead law.

That's true.

However, I think we are digressing from the discussion (probably my fault)

I'm a developer, not a user :-) and I am already giving away my source code. However my code is also be used as a library by other developers and I do not want to impose a license to them. I look forward to see SFS code released under LGPL (or something similar). If it is the whole RosettaSciIO, it is even better.

My own code is licensed under MIT. When I ship a frozen binary, the license of the frozen binary is GPL if I use PyQt5 and it is MIT if I use PySide2/PySide6. For the time being, the simplest I can do is to add your code to the frozen binaries (adding the whole HyperSpy would introduce too many additional dependencies).

francisco-dlp commented 2 years ago

It is not that simple: @sem-geologist's code has being edited by a number of other developers. To distribute the code under a different license that is compatible with your needs we need to ask all the contributors to agree changing the license of the files.

I suggest that, if we are going to go through such a process, let's do it first for the whole repository. If issues arise, then we can considering re-licensing just parts of the code. Does it sound good?

vasole commented 2 years ago

It is not that simple: @sem-geologist's code has being edited by a number of other developers.

But not his very first version:

https://github.com/hyperspy/hyperspy/commit/c198833d7b999a48835508b4c278762afc52cf42

To distribute the code under a different license that is compatible with your needs we need to ask all the contributors to agree changing the license of the files.

For the current status of the code, yes.

sem-geologist commented 2 years ago

@vasole tell my why You are so much interested in SFS? SFS is only the first layer (think about it like zip file). SFS allows to extract internal files - it is indeed useful on its own for developing anything else handling Esprit data. i.e. .pan files (where every tile has its own xml file) and EBSD bcf are packed into such SFS containers too. Initially the sfs code was in separate file, but later I had bunched it into same file to make no mess in hyperspy source file hierarchy. Now, as this library is getting the shape I think it would be good idea (independent from the license change or not) to split that code into separate files as it was initially intended as that will improve maintainability.

vasole commented 2 years ago

Because I already have tools to parse Bruker XRF files (for instance rtx files). What I need is access to the internal files of the SFS container to handle the bcf case.

sem-geologist commented 2 years ago

rtx is a different beast! it is (XML based) project file. You still will need binary parser, SFS alone is not enough. Actually the XRF-EDS bcf and SEM-EDS bcf has the same amount of internal files inside sfs container with same file names. The only difference is the Header file (inside SFS container) which for XRF would contain additional nodes describing the X-ray properties, where sample and detector description would be identical to SEM-EDS. Most important binary Cube (which needs lots of decoding and there is this fancy cython code included to parse it really fast) structure is absolutely agnostic if that is electron excited EDS or X-ray excited EDS hyperspectra. You are not first asking for handling XRF (see this #19) and after solving the #37 maybe it finally can be progressed further (albeit I have no µ-XRF hardware thus it is hard for me to implement it.). But I think You could be useful here as looks You are already familiar what is needed to be read from XML. the Header (XML) file in the SFS container will contain very similar XML nodes as You are familiar already in rtx.

vasole commented 2 years ago

By inspecting the file I thought it was just XML embedded in a SFS container and therefore I already had support for the content as I wrote in #50

AlexHenderson commented 2 years ago

Personally, I think LGPL is the most restrictive licence that should be applied. My MATLAB toolbox (I know, I know. That was then, but this is now) was licenced GPL, and now I can't change it without throwing out lots of GPL licenced packages I've incorporated. Indeed, I was looking for a Python parser for Renishaw Raman files and came across one licenced GPL. When I asked if they would consider switching to a MIT licence, it turned out they'd modelled their code on my MATLAB version which was GPL. D'oh! We both decided to scale back to MIT for that code, which makes my Python toolkit more attractive.

In the FAIR data community they say "as open as possible, but as restricted as necessary". What would we say are necessary restrictions? Acknowlegement of original authorship? Requirement that distributed, modified versions are released? Any others? Then determine a licence based on that minimum set.

For the record, I'm now using MIT.

vasole commented 2 years ago

But I think You could be useful here as looks You are already familiar what is needed to be read from XML. the Header (XML) > file in the SFS container will contain very similar XML nodes as You are familiar already in rtx.

I do not have the equipment. I just provide the analysis software :-)

To be able to load the data, one effectively needs the patch suggested in https://github.com/hyperspy/hyperspy/issues/2898 concerning self.hv

However, that is not enough with huge maps (the only one I have is 1273 x 1578 x 4096). If I do not down-sample the data, the data type is wrongly interpreted as uint8 (instead of uint16).

Did anybody encountered that problem already?

vasole commented 2 years ago

@sem-geologist https://github.com/hyperspy/hyperspy/pull/3048 submitted.

jlaehne commented 1 year ago

This discussion has somewhat stalled (and partly deviated). Still, I think it would be good to go about the re-licensing now that we are close to an initial release - even if might only be implemented in a follow-up release. In principle, we had several proposals for different alternative licenses:

So as step forward, we should

  1. Decide on the licence we aim for (in the next weeks)
  2. Try to get agreement from all Contributors (there are 38 in total, do we need agreement even from people who have only corrected typos? Any idea how to best get their agreement?)
  3. Publish individual readers under separate licenses if we do not get the agreement of everyone
CSSFrancis commented 1 month ago

Okay maybe we try to get the ball rolling on this again. Just starting with the top 10 or so contributors.

@ericpre @francisco-dlp @jlaehne @sem-geologist @pietsjoh @vidartf @nem1234 @din14970 @ssomnath are you in support of changing to a MIT license?

CSSFrancis commented 1 month ago

If we don't get support from some contributors maybe we just think about removing certain parts of the code base or rewriting them?

din14970 commented 1 month ago

For personal projects that I consider worthless I use MIT. In this case, I am personally not in favor of the MIT license. As I see it, MIT essentially means the code can be used by anyone for any purpose. From personal experience, if the for-profit companies abuse this all the time to take free labor from open source, profit from it in a closed source product, and give nothing back. Even worse, they sometimes start making demands from maintainers. That doesn't sit well with me; I prefer if use of open source (strongly) incentivizes creation of and contribution to open source. If this perspective is short sighted, I'm open to someone changing my mind.

jat255 commented 1 month ago

In my personal capacity, I'm generally in favor of more open licensing (MIT, BSD, Apache, etc.), perhaps with a clause requiring explicit acknowledgement/citation, since this is very much an academic/community project. I think most of the permissive licenses require that already, actually...

In my work capacity, any contributions are effectively public domain (AFAIU), and thus can be used as such (not a lawyer, obviously).


RE: @din14970's comments:

I agree with your sentiment in general, as there have been many examples of this sort of behavior happening. In contrast, I look to other parts of the Scientific Python ecosystem and notice that most of our "peer" libraries use permissive licensing:

ericpre commented 1 month ago

I agree that this is time to change to a permissive license (MIT or BSD) because there is a need for rosettasciio code to be released with permissive license (for example https://github.com/hyperspy/rosettasciio/issues/51#issuecomment-1273511577 and https://github.com/hyperspy/rosettasciio/issues/314 / https://github.com/LiberTEM/LiberTEM/issues/1649).

Overall, I would except that it would be beneficial for the community to the IO code could be used more easily by third parties, for example:

CSSFrancis commented 1 month ago

For personal projects that I consider worthless I use MIT. In this case, I am personally not in favor of the MIT license. As I see it, MIT essentially means the code can be used by anyone for any purpose. From personal experience, if the for-profit companies abuse this all the time to take free labor from open source, profit from it in a closed source product, and give nothing back. Even worse, they sometimes start making demands from maintainers. That doesn't sit well with me; I prefer if use of open source (strongly) incentivizes creation of and contribution to open source. If this perspective is short sighted, I'm open to someone changing my mind.

@din14970 I think that's an appropriate consideration in most cases. Here, I feel like if a company wants to use this code and make interoperability better than by all means do it. This is kind of one of those situations where I don't really know if there is money to be made (??although maybe??). Mostly this is just based on LiberTEM relicensing to MIT and it would be a real shame to not be compatible with them.

A more permissive license for Hyperspy and pyxem, I think those would require significantly more discusssion.

The point about not giving back... I don't know... The (maybe too cynical) part of me wants to say that it's not like for profit companies are contributing extensively at the moment. It's not like the GPL license is really helpping... At the very least this might allow me to contribute slightly more as part of my day job, and maybe make a couple of others more motivated to contribute as well. (And we kind of need more contributors...)

din14970 commented 1 month ago

In contrast, I look to other parts of the Scientific Python ecosystem and notice that most of our "peer" libraries use permissive licensing

AFAIK (but please correct me if I am wrong on this) this is because many of these projects are funded directly or indirectly (through foundations like NumFOCUS) with corporate money. Corporations have an interest in making OSS licenses permissive.

https://github.com/hyperspy/rosettasciio/issues/314 / https://github.com/LiberTEM/LiberTEM/issues/1649

Precisely for examples like this is why I would not choose a permissive license. The company will take the code, add a GUI wrapper around it, then sell it back to the same research groups with profit, taking work they didn't have to pay anything for and returning nothing to the original projects. Let's not forget: with GPL they can still use LiberTEM as a dependency, it just means they also have to license their software under GPL which they don't want to do.

it would avoid duplication of IO code (which will happen if we keep the GPL license)

Honestly I don't really see a problem with that. If you want to sell a closed source project, build your own IO and don't piggy back off of free work from others.

other softwares/libraires depending on rosettasciio are likely to contribute once in a while, to fix or improve what their need

Having worked in corporate, I can tell you this is very much wishful thinking.

To be clear, I will not battle you on the license and I will not veto the decision that the majority makes. If the majority of people here decide to go with MIT or BSD I will take peace with that. But since you asked my opinion, I am giving it, and it is largely based on the profiteering I have seen at large corporations.

nem1234 commented 1 month ago

Both licenses have different benefits and disadvantages. I will follow the general trend.

ericpre commented 1 month ago

Precisely for examples like this is why I would not choose a permissive license. The company will take the code, add a GUI wrapper around it, then sell it back to the same research groups with profit, taking work they didn't have to pay anything for and returning nothing to the original projects. Let's not forget: with GPL they can still use LiberTEM as a dependency, it just means they also have to license their software under GPL which they don't want to do.

A different perspective on that is to think about benefit to the users instead of the project itself: I think that this would be a significant benefit to users if some companies could provide useful user interface of open source tools that integrate well with the instrument itself. This would make it more accessible and facilitate screening of the data during the measurement, for example by using the same (advanced) tools for online and offline analysis easily, etc.

CSSFrancis commented 1 month ago

At the end of the day my personal opinion is that supporting something like rosettasciio, hyperspy, pyxem, LiberTem is quite difficult. With the exception of LiberTem there isn't really sustained monetary support for any of these projects. We are kind of left with a couple of different options.

  1. Continue running things as is (and hope @ericpre doesn't get bored)
  2. Try to monitize Hyperspy in some way to have more sustained development
  3. Integrate with coorperations and hope for support from coorperate partners
  4. Apply for continued funding either through a funding organization or society

I've kind of struggled with this a little bit as I've started to work for one of those coorperations. Part of my work is supporting open source software with part of my salary being basically donated to help out wherever I would like (although I might spend more time than I'm supposed to). There is also a bit of an open question of "Does this actually gain us customers?", I would like to think so but I'm not entirely sure... I guess we will see.

I guess I will also say it's not like the GPL license is doing us any favors currently. People still use/ make money off of Hyperspy/ pyxem etc they just don't do it directly. I'm sure some coorperation uses it for doing data analysis and it's not like they are paying a royalty.

jlaehne commented 1 month ago

A different perspective on that is to think about benefit to the users instead of the project itself: I think that this would be a significant benefit to users if some companies could provide useful user interface of open source tools that integrate well with the instrument itself. This would make it more accessible and facilitate screening of the data during the measurement, for example by using the same (advanced) tools for online and offline analysis easily, etc.

Indeed, intercompatibility between different formats provided by RosettaSciIO does in particular empower the user, because he can choose the tool he prefers for his data analysis. Here, we should aim for the broadest possible applicability, which would mean a permissive license. How permissive we want to be on the end of HyperSpy and the other libraries is another story.

din14970 commented 1 month ago

Reading and agreeing with the other arguments, I would be fully on board with MIT/BSD if it is paired with some aggressive community outreach & partnership building attempts. Opening up and hoping for the best seems like a squandered opportunity. You risk becoming a low level library that everyone relies on and yet everyone takes for granted and makes demands on; think of OpenBLAS, a fundamental numerical library that pretty much has 1 person working on it.

Instead, if one actively reaches out and connects with companies that could benefit from the project, one is more proactive in extracting funding/sponsorship and/or developer contribution. This could be done "from the inside" like @CSSFrancis could lobby with their employer or through networks and conferences. Travis Oliphant, the creator of Numpy, has built some interesting models around sustainable open source projects; perhaps we could reach out to him for advice.

There is also a bit of an open question of "Does this actually gain us customers?", I would like to think so but I'm not entirely sure... I guess we will see.

Yes, for the people holding the purse strings, there must always be a good business case. One must show at least one of two things: "it increases revenue" or "it reduces costs". Costs are usually easier to calculate than projected revenue increases, and most companies prefer to outsource development for everything that is not their core product, since development and maintenance of software costs a lot and is risky. I think with RosettaSciIO you can sell it more from the cost perspective. Building and maintaining integrations with formats from other vendors is valuable for customers but also expensive if every vendor has to do it on their own. There exist entire businesses on selling "integrations", e.g. Fivetran. Sharing this burden in an open source project is just good business sense.

jlaehne commented 1 month ago

Instead, if one actively reaches out and connects with companies that could benefit from the project, one is more proactive in extracting funding/sponsorship and/or developer contribution.

One lever is also to request compatibility with HyperSpy in tenders, like I have recently done. On the side of CL vendors, Attolight has developed and is supporting their I/O plugin (as customers have requested it, but also as it is a selling point being compatible to HyperSpy!) and we have gotten Delmic to do the same now.

uellue commented 1 month ago

Just to add the LiberTEM perspective to this, since it was brought up: We started LiberTEM to motivate vendors to open interfaces to their products and help introduce some level of standardization across the ecosystem. For that it is good if it becomes very easy to implement support for a particular format or standard, through both good documentation and code that is ready to re-use. Furthermore, we should bring something to the table when talking with vendors, to give them a real business case for cooperating: Help making their product better for their customers. That was pretty successful: When we started, vendors would refuse to share file format specifications, and now we are actually cooperating for developing formats and interfaces, like with Gatan's Python scripting and support for the ASI CheeTah T3 (Timepix-based 4D STEM detector).

Since a majority of the code that generates data in our community is proprietary, we made interface code permissive from the beginning. However, it turns out that the best chance for re-use and standardization is not only the I/O code, but actually our whole engine. So that's why we are re-licensing to MIT. I believe we got a relatively stable funding stream because we can demonstrate that our project actually works towards the goals of our funders and brings tangible benefits to our community, us as an institute, and last but not least equipment vendors. A clean win-win-win, which I like very much. :-) At that point an unlimited amount of thanks is due to @rafaldb for the continued support of our work!

sem-geologist commented 1 month ago

As I was previously arguing as Ideally I would see LGPL. If I need to choose from MIT or BSD, I absolutely prefer BSD. Albeit I won't veto MIT.