sosia-dev / sosia

Sosia: Automatic author matching in Scopus on-line
https://sosia.readthedocs.io/
MIT License
11 stars 3 forks source link

Refactor reading of source information #40

Closed Michael-E-Rose closed 1 month ago

Michael-E-Rose commented 1 month ago

The goal is to avoid that the source definitions are loaded with every call of Scientist(). In Scientist(), it is read in order to add the source names if someone provides source IDs on their own. In Original(), it is used to define the search group.

Possible solutions:

  1. Drop adding the source names in Scientist(); load the source information files during .search_group_from_sources()
  2. Load the source information files when calling add_source_names(); load the source information independently during .search_group_from_sources()

Perhaps it is possible to read the source information files once and store them in Scientist() (self) to avoid multiple readings. If this is the case, option 2 is to be preferred.

In the end, we should make the reading of files such that get_field_source_information() is called automatically when the files aren't there. This avoids the mandatory call of get_field_source_information().

Michael-E-Rose commented 1 month ago

The field & source information is used in more placed. Notably, to determine someone's field. That happens in Scientist().__init__. So we cannot drop this.