This website contains a lot of very small text blocks which are always the same, around one big text block which is always different.
Visually, if we count words / chars, the page contains a good chunk of unique content, but if we only count the text blocks, there is only one unique text block per page.
=> include the number of chars when computing the % of unique text blocks
Example :
This website contains a lot of very small text blocks which are always the same, around one big text block which is always different.
Visually, if we count words / chars, the page contains a good chunk of unique content, but if we only count the text blocks, there is only one unique text block per page.
=> include the number of chars when computing the % of unique text blocks