meeb / whoisit

A Python library to RDAP WHOIS-like services for internet resources such as ASNs, IPs, CIDRs and domains
BSD 3-Clause "New" or "Revised" License
78 stars 20 forks source link

Caching responses #39

Closed mzpqnxow closed 2 months ago

mzpqnxow commented 2 months ago

Hello, thanks for your work on this project

I have a workflow where I need to perform lookups on a large batch (thousands) of domains on a regular basis - roughly every 24 hours

The domain list changes a small bit from day to day, but the vast majority are the same - there might be 2-3% that are "new" on a given day

For that task, I have a separate project that uses the moribund WHOIS protocol. I ended up hacking in a caching decorator for the requests (specifically the function doing the UDP send/receive) using (domain, whois_server) as the cache key

The response is cached to disk between invocations. It helped to both speed up the job as well as deal with rate limiting

What are your thoughts on having optional caching of RDAP responses in a similar fashion as I described? Is it unnecessary for most use-cases, and/or add complexity without much value?

Caching is of course a much more common and well-supported feature for HTTP libraries, so it shouldn't require much new code

The technical part of the implementation should be straightforward, but I understand it would require some thought, to answer questions like:

If you don't think this is interesting now or in the near future, please feel free to close the issue

Thanks

meeb commented 2 months ago

Hi, thanks for the issue. I can see why you might want this but typically this sort of caching would be handled within your application directly (stuffing responses from whoisit into redis with a 24 hour key expiry or similar). Is there an implementation reason that this would make more sense to handle in the query library itself?

As far as I'm aware RDAP servers do not have cache control headers at all, there may be some that do implement them but I've never encountered any. I'm not sure they would make much sense for a query API to implement so I would suspect no endpoints have them. The RDAP RFC doesn't discuss caching other than cache busting to work around poorly implemented MITM proxies.

In-memory caching of Python data structures, if you're making tens of thousands of queries, is going to get pretty massive without offloading it to disk or memcached / redis etc. as well.

mzpqnxow commented 2 months ago

Hi, thanks for the issue. I can see why you might want this but typically this sort of caching would be handled within your application directly (stuffing responses from whoisit into redis with a 24 hour key expiry or similar). Is there an implementation reason that this would make more sense to handle in the query library itself?

As far as I'm aware RDAP servers do not have cache control headers at all, there may be some that do implement them but I've never encountered any. I'm not sure they would make much sense for a query API to implement so I would suspect no endpoints have them. The RDAP RFC doesn't discuss caching other than cache busting to work around poorly implemented MITM proxies.

In-memory caching of Python data structures, if you're making tens of thousands of queries, is going to get pretty massive without offloading it to disk or memcached / redis etc. as well.

Fair enough! Can't day I disagree much with your points, appreciate the detailed explanation

I'll close this

Thanks again

meeb commented 2 months ago

No problem, feel free to open issues if you want to discuss features in the future.

mzpqnxow commented 2 months ago

Hi, thanks for the issue. I can see why you might want this but typically this sort of caching would be handled within your application directly (stuffing responses from whoisit into redis with a 24 hour key expiry or similar). Is there an implementation reason that this would make more sense to handle in the query library itself?

Yes, the reason is deeply technical, you see- I would prefer you maintain it rather than me 😊

^-- joking, sort of.. but when you ask directly like that and I have to think about it, I see that may be the honest answer...

meeb commented 2 months ago

Heh, I'm very open to adding useful features, less so to bloat 😀

Most of project maintenance after the core is built is offering support and saying no to stuff.