Closed jasiek-net closed 6 months ago
Looks like the previous collection might have been deleted manually outside of the scraper...
To fix this, you want to delete the alias and the collection created by the scraper via the API, and then then re-run the scraper.
@jasonbosco I experience this issue too.
I am using Typesense cloud, and the scraping process can only be done succesfully when I deleted all collection and alias.
Then next attempt to scrape will always fail (Delete collection not found)
@nascode The next time this happens, could you look at the Alias section in your Typesense Cloud dashboard and let me know if the collection that the alias is pointing to exists?
@jasonbosco
Before running the scraper, here are my collection list and alias
Then I tried to run the scraper and got this error
typesense.exceptions.ObjectNotFound: [Errno 404] No collection with name `service-bridge-index_1689120603` found.
Could you show me similar screenshots of the alias screen and the collection selector after you run the scraper?
Also could you double-check that there is only one instance of the scraper running?
We had a similar issue with Typesense DocSearch scraper 0.8.0 and our locally hosted Typesense server (0.25.1). Here's a log example matching the one reported by @jasiek-net.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/seleuser/src/index.py", line 138, in <module>
run_config(environ['CONFIG'])
File "/home/seleuser/src/index.py", line 126, in run_config
typesense_helper.commit_tmp_collection()
File "/home/seleuser/src/typesense_helper.py", line 105, in commit_tmp_collection
self.typesense_client.collections[old_collection_name].delete()
File "/home/seleuser/.local/share/virtualenvs/seleuser-AdYDHarm/lib/python3.10/site-packages/typesense/collection.py", line 22, in delete
return self.api_call.delete(self._endpoint_path())
File "/home/seleuser/.local/share/virtualenvs/seleuser-AdYDHarm/lib/python3.10/site-packages/typesense/api_call.py", line 158, in delete
return self.make_request(requests.delete, endpoint, True,
File "/home/seleuser/.local/share/virtualenvs/seleuser-AdYDHarm/lib/python3.10/site-packages/typesense/api_call.py", line 115, in make_request
raise ApiCall.get_exception(r.status_code)(r.status_code, error_message)
typesense.exceptions.ObjectNotFound: [Errno 404] No collection with name `main_1706279577` found.
The interesting stuff happened just before the above when the scraper requested Typesense to delete the previous collection (in this case, main_1706279577
) for the current alias (main
):
INFO:scrapy.core.engine:Spider closed (finished)
DEBUG:typesense.api_call:Making get /aliases/main
DEBUG:typesense.api_call:Try 1 to node 192.168.100.115:8108 -- healthy? True
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.100.115:8108
DEBUG:urllib3.connectionpool:http://192.168.100.115:8108/ "GET /aliases/main HTTP/1.1" 200 None
DEBUG:typesense.api_call:192.168.100.115:8108 is healthy. Status code: 200
DEBUG:typesense.api_call:Making put /aliases/main
DEBUG:typesense.api_call:Try 1 to node 192.168.100.115:8108 -- healthy? True
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.100.115:8108
DEBUG:urllib3.connectionpool:http://192.168.100.115:8108/ "PUT /aliases/main HTTP/1.1" 200 None
DEBUG:typesense.api_call:192.168.100.115:8108 is healthy. Status code: 200
DEBUG:typesense.api_call:Making delete /collections/main_1706279577
DEBUG:typesense.api_call:Try 1 to node 192.168.100.115:8108 -- healthy? True
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.100.115:8108
DEBUG:typesense.api_call:Request to 192.168.100.115:8108 failed because of HTTPConnectionPool(host='192.168.100.115', port=8108): Read timed out. (read timeout=3.0)
DEBUG:typesense.api_call:Sleeping for 1.0 and retrying...
DEBUG:typesense.api_call:No healthy nodes were found. Returning the next node.
DEBUG:typesense.api_call:Try 2 to node 192.168.100.115:8108 -- healthy? False
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 192.168.100.115:8108
DEBUG:urllib3.connectionpool:http://192.168.100.115:8108/ "DELETE /collections/main_1706279577 HTTP/1.1" 404 None
DEBUG:typesense.api_call:192.168.100.115:8108 is healthy. Status code: 404
Traceback (most recent call last):
...
typesense.exceptions.ObjectNotFound: [Errno 404] No collection with name `main_1706279577` found.
The previous collection (main_1706279577
) was actually successfully deleted on try 1 (we checked in Typesense), but the scraper timed out before the delete request completed. When the scraper then retried the delete request, the collection was already gone and try 2 terminated in the typesense.exceptions.ObjectNotFound
error.
We just now switched to scraper version 0.9.1, which includes the timeout increase update, and everything's working fine thus far.
PS. The scraper started timing out only last week (after our Typesense had been running for two months or so). We have a relatively small database with some few dozen smallish collections, and our server resources are perfectly sufficient for the work, so we have no idea why Typesense suddenly started taking so long in deleting a single collection.
@timolagus Thank you for those details. I increased the timeout for all write operations in v0.9.1
of the scraper to solve for a different use-case, but that would definitely help with deletes timing out as well.
We've also made some improvements to the collection delete performance in v0.25.2
of Typesense Server, which should also help prevent this issue.
Description
When I try to rescrape documentation, I've got error at the end (404).
Steps to reproduce
Scrape documentation in docusaurus once, then rerun scraper
Expected Behavior
Replace existing docs with new one.
Actual Behavior
Error on delete collection
Metadata
Typesense Version: 0.24
OS: macbook