haitian-sun / GraftNet

BSD 2-Clause "Simplified" License
268 stars 56 forks source link

document is concatenated after title, and with an extra '|' ? #3

Closed huiwudiyi closed 5 years ago

huiwudiyi commented 5 years ago

First,thanks for sharing your code when I read your code ,I found a place where I do not unstand. In the file 'utile.py' function : load_documents() passage['tokens'] = document_token + ['|'] + title_token

but in the function 'index_document_entities(): you said that word_ids are off by (title_len + 1) because document is concatenated after title, and with an extra '|'

should I switch the positons of document_token and title_token?

haitian-sun commented 5 years ago

Thanks for pointing this out. And sorry for the late reply.

Yes, you should switch document_token and title_token. We'll revise our code. We did some further experiments after the change. The results are the same.