INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
This PR adapts to recent changes in Reach for extracting section information when reading nxml files. There was an old implementation of this but Reach stopped producing section names at some point, and the new reinstated implementation is different, so the code on the INDRA side also had to be adapted. I did some empirical statistics on the kinds of (unnormalized) section names that occur and made improvements to their normalization.
Independently, it looks like PubMed changed their search API to return a maximum of 10k instead of 100k IDs for searches, requiring updates to tests. I also improved the way we get MeSH IDs from non-standard MeSH URNs from MedScan.
This PR adapts to recent changes in Reach for extracting section information when reading nxml files. There was an old implementation of this but Reach stopped producing section names at some point, and the new reinstated implementation is different, so the code on the INDRA side also had to be adapted. I did some empirical statistics on the kinds of (unnormalized) section names that occur and made improvements to their normalization.
Independently, it looks like PubMed changed their search API to return a maximum of 10k instead of 100k IDs for searches, requiring updates to tests. I also improved the way we get MeSH IDs from non-standard MeSH URNs from MedScan.