sidneycadot / oeis

Code to download, process, and analyse the Online Encyclopedia of Integer Sequences
46 stars 34 forks source link

Web scraping of the OEIS is prohibited by its terms of use #7

Closed gwhitney closed 2 years ago

gwhitney commented 2 years ago

See item 10 in http://oeis.org/OEISTermsOfUse.pdf, which requires prior permission to scrape the contents of the OEIS. If as it seems this is what a portion of this code is intended to do, it should be accompanied by a prominent notice that permission must be obtained from the OEIS prior to running that portion of the code.

sidneycadot commented 2 years ago

I don't think it is necessary to point that out in this software. I think it is the personal responsibility of users of this software.

Also, I have doubts if that clause is not in direct contradiction with the Creative Commons license that OEIS is made available under, as per https://oeis.org/wiki/The_OEIS_End-User_License_Agreement. I raised this issue once on the OEIS mailing list but got no satisfactory answer to that.

If a representative of the OEIS Foundation contacts me in that capacity to discuss this issue in a bit more detail, I'd be happy to discuss and find a solution. Perhaps you are; if so, please let me know.

Note that I am not particularly protective of this repository. If a formal representative of the OEIS Foundation indicates they find it objectionable, I will certainly consider adding messages or removing the repo altogether. I would like to understand why though. Is it a matter of technical infrastructure (are many people using the script and it's hammering the server?) or are they against full and open distribution of the data?

As a matter of principle, my opinion is that the data inside OEIS should be public and the OEIS Foundation should provide a supported means of mirroring the data from the outside. My script is one way to achieve that, but it is a pretty bad way to implement that technically. If a better way were to be found (eg the OEIS foundation provides a way to make proper mirrors using FTP or rsync), this entire issue would go away.

Then again, I recognise that my opinion isn't very relevant in terms of legality.

gwhitney commented 2 years ago

No, I have no official connection with the OEIS. I am just a member of a research group working on a data visualization project in which it would be convenient to dupe the full OEIS database, but we've felt we've had to change directions there because of that clause in their license. So I was searching for any other ways to obtain the full dataset, and came upon your repository but then found that this seemed to be roughly the approach that we'd been considering and abandoned. So I thought I'd post the issue on the off chance that you weren't aware of the restriction and so that others visiting could potentially more easily see that this doesn't offer a way around that restriction. That's all.

gwhitney commented 2 years ago

P.S. I don't think the CC license in any way obligates the OEIS to let you use their servers in ways they don't condone.

sidneycadot commented 2 years ago

Ok.

In the toplevel message you state: "If as it seems this is what a portion of this code is intended to do, it should be accompanied by a prominent notice that permission must be obtained from the OEIS prior to running that portion of the code."

What makes you say it should be? Do you feel that's legally required, or, in your assessment, a good idea? I don't understand the should.

gwhitney commented 2 years ago

Well, I think the usual presumption people make on finding open-source code is that they have the right to download and run the code as it stands. In this case, even if they are being consistent with whatever license you are releasing this code under, I think it would be courteous to make potential users of the code aware that if they run certain portions of this code, they might be violating the terms of use for the service that code is designed to access. (I don't actually at first glance see a license file indicating the terms under which the code in this repository is being shared. Adding such a file that also includes a note that use of certain functions in the code might or might not be consistent with the terms of use of the OEIS, perhaps calling attention to fetch_oeis_database.py, might be one very reasonable way to alert potential users to these issues. I am reminded of MAME software that points out that users of the software may only properly use the software with ROM images they have legally obtained and have the rights to use.)

So, I am not aware of any legal requirement for you to add such a notice to your software, but I do think that ethically and as a matter of courtesy to potential users of your code, you "should" add such a notice. This is just my opinion; I am definitely no legal scholar on intellectual property rights. I opened this issue as I said primarily to make sure the creator(s) of the code were aware of the potential issue, which now definitely seems to be the case, and because it might go some portion of the way toward possible users of the code becoming aware of the potential issue.

I very much hope I haven't given any offense.

sidneycadot commented 2 years ago

Thanks for the explanation.

You haven't given offense, and your stance on this matter is reasonable, thanks for explaining. It does raise my shackles a bit if people say I "should" do something with merely a reference to legal text (the meaning and validity of which is almost invariably arguable). Much better to just have a proper discussion.

My thoughts on this matter are complicated and border on the philosophical (morality-vs-legality). In essence, I think morally the importance of the OEIS being publicly available to me outweighs the right of the OEIS Foundation to restrict access to it based on a technicality ("it is sitting on our server and we can restrict access to that"). Perhaps that's legal (which I think is a lot less important than moral), but to me it is in direct contradiction with at least the spirit of the CC license.

Anyway I haven't touched this repo for a few years and I haven't received complaints by the OEIS foundation. Originally this work was only intended to scratch a personal itch, and back then indeed I did obtain permission to request data as the program does from the OEIS server, via some emails exchanged on the Foundation's mailing list.

It is a bit to my surprise that this repo has been cloned and starred as much as it has been. It seems indicative of a need.

Personally, as said before, I think the OEIS Foundation should provide a proper technical means of mirroring their data. I am still unsure why they don't; the answer I got years ago on the mailing list were too vague for my taste. They were worried about the load on the server (fine but easily fixable by a proper technical solution); and worried about unauthorized versions popping up (which is just something you have to accept as a possibility if you care about open data). Personally, it felt more that they were sitting on their treasure chest a bit too much, to your detriment and mine.

sidneycadot commented 2 years ago

Issue closed as far as I am concerned.