Open hinneburg opened 8 years ago
In table DOCUMENT is the field NUMBER_of_TOKENS not the number of tokens, but the number of characters.
DOCUMENT
NUMBER_of_TOKENS
Compare
select DOCUMENT_ID, NUMBER_OF_TOKENS from DOCUMENT where DOCUMENT_ID=361165; +-------------+------------------+ | DOCUMENT_ID | NUMBER_OF_TOKENS | +-------------+------------------+ | 361165 | 5595 | +-------------+------------------+ 1 row in set (0.00 sec)
with
select DOCUMENT_ID, COUNT(*) from DOCUMENT_TERM where DOCUMENT_ID=361165; +-------------+----------+ | DOCUMENT_ID | COUNT(*) | +-------------+----------+ | 361165 | 743 | +-------------+----------+ 1 row in set (0.00 sec)
Fix: Change the fill statement for table DOCUMENT.
In table
DOCUMENT
is the fieldNUMBER_of_TOKENS
not the number of tokens, but the number of characters.Compare
with
Fix: Change the fill statement for table
DOCUMENT
.