mikeizbicki / cmc-csci143

big data course materials
40 stars 76 forks source link

postgres index normalized gin error #521

Closed myngpog closed 7 months ago

myngpog commented 7 months ago

hello when i try to run this command in the psql for normalized batch i get this error/notice:

NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.
CREATE INDEX

is this normal or should I be concerned about how my data was loaded? thanks

luisgomez214 commented 7 months ago

I am getting the same issue, were you able to find a solution?

myngpog commented 7 months ago

I am getting the same issue, were you able to find a solution?

yeah! i just filtered my index for english but it may not apply to you :))

mikeizbicki commented 7 months ago

This warning won't affect anything. Technically what it is saying is that there is a word extracted from the to_tsvector function that is too long to fit in the index. Since the index page is 8kb, there is a hard limit that no entry can be larger than 2047 bytes, and somehow one of the words being extracted is larger than this. It's probably a url, and since it's not something that you'll be searching for, not having it in the index will not result in any problems.