pg-tr / pgconfig-in-blogs

Web service and web crawling service that crawls given blog sites and maps postgresql config parameters and saves the blog links in pg.
Creative Commons Zero v1.0 Universal
1 stars 1 forks source link

Anchors are treated as different pages #3

Open edib opened 2 years ago

edib commented 2 years ago

There are multiple records in the db for same page.

wal_buffers https://postgis.net/workshops/postgis-intro/tuning.html#reload-configuration wal_buffers https://postgis.net/workshops/postgis-intro/tuning.html#basic-postgresql-tuning wal_buffers https://postgis.net/workshops/postgis-intro/tuning.html#effective-cache-size wal_buffers https://postgis.net/workshops/postgis-intro/tuning.html#work-mem wal_buffers https://postgis.net/workshops/postgis-intro/tuning.html

kaanatesel commented 2 years ago

As they have different URLs then the application sees them as different web pages. To prevent this we can distinguish web pages by doing content checking instead of URL checking. However, it will be very inefficient. Can we differentiate web pages in any other way which is more efficient than content comparison?

edib commented 2 years ago

En iyi yöntem # işaretinden sonrasını kaldırarak işlemek olmaz mı?