john-hu / untitled

0 stars 0 forks source link

general web crawler #50

Open john-hu opened 2 years ago

john-hu commented 2 years ago

As a search engine, we should build a general web crawler for internet. It could do:

find undiscovered website URL
find schema.org recipe type from undiscovered URL

Please note that this kind of web crawler may consume a lot of memory or storage to keep the discovered and undiscovered URLs. Could we find a smart way to implement it??

john-hu commented 2 years ago

39 is deployed with the general crawler.

john-hu commented 2 years ago

TODO: remove the href just with hash

john-hu commented 2 years ago

TODO: support the search on @graph in ld+json

john-hu commented 2 years ago

related issues #95 #96