Closed RedWilly closed 1 year ago
@RedWilly I'm having to make updates, but from what I found this code was copied from another repo. basically what he did was create a more complete readme and change the name of the project.
I would really like it to be from the original author to continue with the improvements. Original repo: https://github.com/phucvo0709/Clone-Google-Search-Engine
@pedrolaxe
I initially developed Doogle from Reece Kenney's course Google search engine clone. I suspect the Clone-Google-Search-Engine repo you provided was built using his course too, as I can see object-oriented PHP and PDO references similar to Doogle. No copying of repos occurred.
@RedWilly
I never thought about using sitemap.xml
to crawl the website.
Doogle crawls and inserts database entries using links using the insertLink($url, $title, $description, $keywords)
and images using insertImage($url, $src, $alt, $title)
functions respectively.
As of v1.1.2-beta all crawling functionality is contained within crawl.php
and classes/DomDocumentParser.php
.
Regarding the link you provided, I am not able to replicate your issue (see image below). I am running Doogle v1.1.2-beta and PHP 8.1.
Hey, i wanted to ask you some questions.
so the question is i wanted to use your search engine to index a website using sitemap.xml ( index and crawl the whole content from the website) this way it will be easier to pinpoint the engine on what pages it needs to search on. it would be much more easier to find content you are looking for.
because I followed your Read.me file but each time Doodle crawl through a website I find out that it only saves the page title and the website description. eg. Hackernew website. ( when I index and search for a keyword the result is almost the same( description) but the URL is present and the title is not.
eg. when I search for Malware
the result present is title: Malware Strains Targeting Python and JavaScript Developers description: The Hacker News is the most trusted and popular cybersecurity publication for information security professionals seeking breaking news, actionable insights https://thehackernews.com/2022/12/malware-strains-targeting-python-and.html
see the description uses the main website description instead of the blog page.
am not sure if am missing something.