@philbudne Made a summary histogram of the different formats we see present in the url_search_string field in the directory:
1562705 rows where url_search_string is NULL
18 rows where url_search_string is empty string
7 rows where url_search_string starts with "http"
62 rows where url_search_string starts with ""
212 rows where url_search string doesn't start with http or
We should decide on a standard format we want those to appear in, (probably: scheme/do.ma.in[/path] with a wildcard in some non-zero position of path), document it somewhere, and enforce that standard across the directory. This will involve some additional validation in web-search to enforce going forward, and a sweep across the ~300 entries to try and bring them up-to-date. Thinking now that this is a good 'ticketing' test case.
Current consideration is that we want that scheme to be set as do.ma.in[/path] and omit the scheme in the database, instead preferring the scheme to be interpolated in web-search (per #822)
@philbudne Made a summary histogram of the different formats we see present in the
url_search_string
field in the directory:We should decide on a standard format we want those to appear in, (probably:
scheme/do.ma.in[/path]
with a wildcard in some non-zero position ofpath
), document it somewhere, and enforce that standard across the directory. This will involve some additional validation in web-search to enforce going forward, and a sweep across the ~300 entries to try and bring them up-to-date. Thinking now that this is a good 'ticketing' test case.