medialab / hyphe

Websites crawler with built-in exploration and control web interface
http://hyphe.medialab.sciences-po.fr/demo/
GNU Affero General Public License v3.0
329 stars 59 forks source link

paginate tree page in Web Entity folder view #373

Open paulgirard opened 4 years ago

paulgirard commented 4 years ago

When dealing with a high number of pages, the web entity folder view can take a very large amount of time to display a folder view. In my case I had 21k web pages after having crawled 176 wikipedia pages (one wikipedia article https://fr.wikipedia.org/wiki/Programmation_informatique start page at depth 1). When opening the folder view, the pages are loaded by batch correctly. The first level indicates the number of pages by prefix, works well. When opening the prefix with 21k subpages, the page takes ages to be built. I waited and it ended up well.

We should have a look at this code to avoid such a bottleneck either by optimizing or by adding a virtual list feature there. This page is particularly key as it's the only way to create a Web entity creation rule.