searchisko / search.jboss.org-ui

Web UI for search.jboss.org
Apache License 2.0
5 stars 4 forks source link

Double // in sys_url_view #84

Open lukas-vlcek opened 10 years ago

lukas-vlcek commented 10 years ago

Some documents contain double // "in the middle" of its URL. We should consider removing the extra / when displaying URL for such document in search results.

For example we can get document containing URL like this:

http://www.jboss.org//archetypes/eap/jboss-html5-mobile-archetype-wfk/index.html

Note the double // in "...jboss.org//archety...". It looks strange in search results page: screen shot 2014-08-07 at 20 11 09

Google is showing only a single / (may be it is just given a different list of URLs to crawl?):

screen shot 2014-08-07 at 20 47 02

Anyway, still the biggest issue can be that the document exists under two different URLs (can this be penalized by search engines?):

Should we rather ask the content provider (@pmuir) to have a look at this and fix it directly in the indexer instead?

Also see relevant StackExchange discussion, it might be a bit dated but some points can be still relevant.