Closed magnuspalmblad closed 7 months ago
Hi Magnus,
Thanks for bringing this to my attention. I just added a few additional options to help with data clean up overall. The 'Split Camel Case' option should address most of the condensed affiliations you are encountering. Its regex based so not a perfect solution but it appears to work well enough.
I'll look into a geoparsing library, but I fear the affiliation strings may just be too inconsistent.
Let me know if you encounter any more issues.
Best, Patrick
It appears the affiliations in some PubMed entries lack spaces between cities, states and postal codes, e.g.
Institute for Systems Biology SeattleWashington98109 USA.
in https://pubmed.ncbi.nlm.nih.gov/37969874/. Perhaps a geoparsing library could be used to clean up these and other, similar, errors in the PubMed affiliations?