serpapi / public-roadmap

Public Roadmap for SerpApi, LLC (https://serpapi.com)
45 stars 3 forks source link

[Search Archive API] Return all Stored `search_id`s #685

Open schaferyan opened 1 year ago

schaferyan commented 1 year ago

A customer requested a way to get their last 1000 search_ids with an API call. Customers have asked for similar things in the past and our usual suggestion is to extract the search_id on their end after each request. However, usually when a customer asks for this they are hoping to retroactively access past search results, so this suggestion doesn't solve the immediate problem for them. I'm wondering if providing this feature is feasible/justifiable on our end since it seems to be requested relatively often.

I imagine it could look like adding an optional parameter to the Search Archive API, that when set to true would remove the need to pass a search_id and would return just an array of all of the users' search_ids.

aliayar commented 1 year ago

We had such feature before but as it has been misused by our users and affected our servers, we decided to take it down until we have a better implementation.

hartator commented 1 year ago

Yes, it was mostly a performance issue about pagination our side.

schaferyan commented 1 year ago

From #2434 in our private repo:

Additionally to the Searches List API (#1925), allow bulk export of searches to a CSV or JSON.

TODO

  • [ ] Searches can be filtered by date range and search engine.
  • [ ] Results will be processed in the background and uploaded as an archive to S3 with TTL < 30 days.
  • [ ] Display a link to an archive in the Dashboard.

The download part of the upcoming Bulk API(#976) can reuse the Bulk Export.

aliayar commented 7 months ago

While I was going through a security questionnaire today, I noticed a question:

In case of an API KEY leak, is it possible to view all searches only using API KEY?

At this point, not returning all stored search ids via API requests might be a security feature.

851 might be a better choice than implementing this one.

richardm commented 6 months ago

This would be a very helpful improvement. I want to be able to fire off async search requests via various job and not have to wait for a response or store the search_id. I want a separate Lambda that retrieves all of the recent results, either one at a time or as a batch so I can process them live and/or store them in S3 for offline processing.

Having to store the search_ids in order to retrieve the results couples together two parts of my system that should be independent, especially since I don't want the retrieval lambda to have access to my database, and I'd prefer not to generate a bunch of temporary S3 files...

In an ideal world, I'd like an API that provides me with a list of all of my search results, perhaps with a way to filter to async results or results I have not yet retrieved. Perhaps also an API where I could pass back a list of search_ids to mark them as retrieved, which could alleviate the security concern mentioned above.

Thanks for the consideration.

P.S. It's almost 2024. People should know by now how to secure their API keys. Unless people are using your API in a way which violates your TOS, having the search history exposed is pretty low on the severity list when it comes to the danger of not handling API keys appropriately. Besides, you provide the ability to regenerate API keys (which I certainly hope expires the old ones).

alexbarron commented 3 weeks ago

Customer requested this.

Front