numberscope / backscope

Numberscope's back end: responsible for getting sequences and other data from the On-Line Encyclopedia of Integer Sequences, pre-processing it (factoring etc), and storing it.
MIT License
1 stars 9 forks source link

Ask OEIS for permission to scrape/download their data #19

Closed samzhang111 closed 2 years ago

samzhang111 commented 3 years ago

I'd like to store a copy of the OEIS on the server's database.

Why: If we want to make a cross-ref graph, we would need a bit of preprocessing to see beyond the immediate neighbors of the target node. Also this would reduce our external dependencies so the app is robust if OEIS becomes slow.

However, the OEIS specifically forbids web scraping without permission, so I think we should get their permission first.

gwhitney commented 3 years ago

Aha. I think then we might want to consider the following strategy: get Numberscope to a state in which it does something sufficiently cool that we can "show it off" to the powers that be at the OEIS. Then do that, and say, "by the way, we hope this gets to the point at which we might be a drag on your servers, so can we keep an entire copy of your database?" Thoughts on that approach?

gwhitney commented 3 years ago

Also wanted to record here the suggestion that (when/f we have permission) we download the whole OEIS, and then whenever somebody asks specifically about a sequence (gets its values, or crossrefs), that becomes a "sequence of interest" that we asynchronously check the official OEIS for updates to. That way, the portions of the OEIS that are of interest will remain up to date (and clearly we don't really care about the rest ;-). This also seems one reasonable solution to #13.

gwhitney commented 2 years ago

Another point is that if we get to the point that we really need a mirror of the OEIS, then likely they will have a preferred and more efficient manner for us to obtain that data than issuing 350,000 HTTP GET requests.

samzhang111 commented 2 years ago

Yes, this sounds good to me too. I'll close this issue for now, and we can revisit in the future when the project is more mature.