Propagate section information from new Reach implementation

This PR adapts to recent changes in Reach for extracting section information when reading nxml files. There was an old implementation of this but Reach stopped producing section names at some point, and the new reinstated implementation is different, so the code on the INDRA side also had to be adapted. I did some empirical statistics on the kinds of (unnormalized) section names that occur and made improvements to their normalization.

Independently, it looks like PubMed changed their search API to return a maximum of 10k instead of 100k IDs for searches, requiring updates to tests. I also improved the way we get MeSH IDs from non-standard MeSH URNs from MedScan.

sorgerlab / indra

Propagate section information from new Reach implementation #1399