Input data preprocessing to remove noise

I just found the following problem, although since the data is extracted from a PDF I'm not sure it's the right place where to fix the issue.

The following DOI: 10.1063/1.1905789͔ comes out with a nasty 9͔ ...

Although I think this is not glutton lookup's responsibility, I think having a small pre-processing that removes crap could be nice anyway .

Update: I've checked and since we lookup by DOI directly from LMDB it's a rather strict matching (we lowercase already)

kermitt2 / biblio-glutton