wellcometrust / reach

Wellcome tool to parse references scraped from policy documents using machine learning
MIT License
25 stars 4 forks source link

Publications cited in more than one policy document are not grouped together #596

Closed dd207 closed 4 years ago

dd207 commented 4 years ago

Publications that appear in more than one policy document are not grouped together.

Feedback has shown that users think the policy citations are duplicates and lose trust in the product.

Screenshot 2020-08-20 at 13.33.27.pngScreenshot 2020-08-20 at 13.33.27.png

SamDepardieu commented 4 years ago

This behaviour is happening in staging too. I'm investigating this

SamDepardieu commented 4 years ago

Alright, I had a good look at it, I think this is coming from us grouping reference together during the Fuzzymatching part on their extracted reference id instead of their matched title id. We shouldn't have a hard time fixing that, but we will need to clean/re-populate the database as this issue happens before we insert everything in postgresql.