Closed andrewheadricke closed 3 years ago
the truncation is done always to create unique urls. Hash parameters are usually only used on the client side, hash params wont be passed to the server. That means they never exist as a request to the server, thus all urls must be truncated as that is only relevant on a client-side (possible javascript) evaluation.
Thanks @Orbiter, while a HTTP server may not be able to tell the difference between requests for http://mysite.com/#!/blog/1
and http://mysite.com/#!/blog/2
it would seem to me that a search engine would want to index both blogs 1 and 2 and return both links in the search results.
The only issue I can think of is for crawling? A simple non-rendered crawler would get the same result for both URLs, but you could just disable crawling for hash param urls I guess?
I was successfully able to manually create an index for my SPA using custom built Warc files as suggested, however now I appear to have run into a potentially much bigger issue. Yacy appears to be truncating urls after the
#
sohttp://mysite.com/
andhttp://mysite.com/#!/blog/1
overwrite each other in the index.Is it possible to change my local Yacy node to not truncate at the hash, will this break p2p compatibility?