Open simmonn opened 3 months ago
Could you make sure the html selectors exist on that page?
Also, could you make sure that the base url of those links are specified in start_urls
section?
Could you make sure the html selectors exist on that page?
Also, could you make sure that the base url of those links are specified in
start_urls
section?
Yes, I had configured it. These selectors can be selected using XPath expressions in the Chrome console. And I tried using BeautifulSoup to compress the HTML source code, which can solve the problem. But I'm not sure what the root cause is.
here is the code :
Description
Hi, I encountered a problem. After executing the scraper, I found that the content of some links cannot be crawled. The logs show 0 records. I have tried many methods, but it still cannot be crawled.
here is the snapshot of logs:![image](https://github.com/typesense/typesense-docsearch-scraper/assets/37128453/54f44e18-7766-494e-99da-affd933bc602)
Steps to reproduce
here is part of my config
Expected Behavior
I hope to crawl the content of all the links in the configuration into Typesense.
Actual Behavior
Content cannot be searched
Metadata
Typesense Version: maybe 0.24,I don't know how to get to know version
OS:x86_64 GNU/Linux