A lot of cdx dedup requests fail. Checking production logs, we see that
we try to dedup URLs that are certainly volative and session-specific.
We can skip them to reduce cdx dedup load. We won't find any matches
anyway since they contain session-specific vars.
We suggest to skip cdx dedup for URL that include JSESSIONID=,
session= or sess=. These are common session URL params, there could
be many-many more.
A lot of cdx dedup requests fail. Checking production logs, we see that we try to dedup URLs that are certainly volative and session-specific. We can skip them to reduce cdx dedup load. We won't find any matches anyway since they contain session-specific vars.
We suggest to skip cdx dedup for URL that include
JSESSIONID=
,session=
orsess=
. These are common session URL params, there could be many-many more.Example URLs: