heuristicus / paper-utils

Utilities for document similarity and reference extraction for research papers
MIT License
0 stars 0 forks source link

Reference similarity can be too aggressive in cutting out words before and after title #9

Open heuristicus opened 6 years ago

heuristicus commented 6 years ago
J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling appearance, shape and context. IJCV, 2009. 2

and

J. Shotton, J. M. Winn, C. Rother, and A. Criminisi. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV, 81(1), 2009. 1, 3, 5, 6

gives

Arrays not equal:
['textonboost', 'image', 'understanding', 'multi-class', 'object', 'recognition', 'segmentation', 'jointly', 'modeling', 'texture']
['textonboost', 'image', 'understanding', 'multi-class', 'object', 'recognition', 'segmentation', 'jointly', 'modeling', 'appearance']

This is OK because they aren't actually the same, but part of the title is being cut off.