Closed tiredpixel closed 7 months ago
The only parts of Register Common used are adapters/s3_adapter
and structs/aws_credentials
. These are used within config/initializers/adapters.rb
for setting up the S3 adapter. I'm not fully convinced of the merits of this approach—however, that's irrelevant, since there are custom methods written for S3 use, and even if not, there is almost certainly merit in having at least one shared library across both Register Web and data pipelines. This is Register Common, so I'm satisfied that this is a valuable internal dependency.
The only parts of these used are are repositories/x
, where x
varies between DK, PSC, and SK datasources because of inconsistent naming. These are used within app/repositories/raw_data_record_repository.rb
to provide functionality for viewing raw data via the original source data link for an entity. Whilst this functionality is obviously important for Register, I'm a little unsure about the way in which it's been implemented: DK, PSC, and SK repositories are all registered within initialize
method, and then searches are made within each repository, and those results mapped back to each repository according to different naming for each repository. This means that adding a repository to Register requires multiple things to be changed:
Gemfile
RawDataRecordRepository
initialize
config/data_sources.yml
(as of https://github.com/openownership/register/issues/212 , not originally)RawDataRecordRepository
RawDataRecordRepository
process_results
to add 2 lines per repositorymap_x_es_record
in RawDataRecordRepository
, where x
is the new datasourceI think this can probably be simplified. However, doing so would require more consistency between Register Sources repositories, in particular in the repositories
directory. This is exactly what is noted in https://github.com/openownership/register/issues/216 , so I'm parking this part until that's done.
It's not immediately clear where this is used. Nothing appears to be using it directly, so either it is unused or a hidden dependency. Main functionality of Register Web, including raw data, appears to work without this dependency.
There appear to be two main uses of this repository:
structs
, enums
, and repositories
.register
.(1) seems fair enough, since Register needs some data definitions such as are shared with data pipelines. I'm not convinced this needs to be in a whole separate repository, rather than, say, in Register Common, but that's a separate topic, related to the overall multi-repository structure in general. So, I think this can be left, within this ticket, at least.
(2) merits more analysis. Specifically, Register Sources BODS contains the following within register
:
entity.rb
: used within Sources BODS register/statements_mapper.rb
entity_query_builder.rb
: used within Sources BODS register/entity_service.rb
entity_service.rb
: subclassed within Register app/repositories/entity_service.rb
paginated_array.rb
: used within Sources BODS register/entity.rb
, register/entity_service.rb
, repositories/bods_statement_repository.rb
; Register app/repositories/raw_data_record_repository.rb
; Register app/controllers/entities_controller.rb
then appears to define a closely-duplicated yet different PaginatedArray
classprovenance.rb
: used within Sources BODS register/relationship.rb
relationship.rb
: used within Sources BODS register/statements_mapper.rb
; Register app/service_objects/relationships_sorter.rb
statement.rb
: not entirely clear, but seemingly unusedstatement_loader.rb
: used within Sources BODS register/entity_service.rb
statements_mapper.rb
: used within Sources BODS register/statement_loader.rb
unknown_entity.rb
: seemingly unusedunknown_person_builder.rb
: used within Sources BODS register/statements_mapper.rb
Thus, it appears that files within Sources BODS register
are used only amongst files in the same directory, or within Register itself. Considering that, I think there's limited merit in having them live in a separate library, rather than simply within Register itself.
Concluding, to-do for this task:
register
and related tests to Register lib
or similar~PaginatedArray
with Sources BODS PaginatedArray
, if possibleSince PaginatedArray
is also used in Sources BODS repositories/bods_statement_repository.rb
, it cannot be moved to Register. However, PaginatedArray
subclasses Array
and does not depend on anything else in Register & friends, meaning it is a standalone class. This is a good reason to extract it to Register Common, after which, both Sources BODS and Register can use it.
PaginatedArray
to Register CommonAfter having spent some time digging through Sources BODS for register
extraction, I've changed my mind. Although some things, such as search, could likely be better placed within Register itself, there is a tight coupling to Sources BODS, not only in usage, but also seemingly in many data definitions being placed there—even if not appararently used within Ingestors or Transformers. Thus, I'm leaving them as-is, but might well extract specific search functionality in the future, now I'm more acquainted with what's going on there.
The remainder of this ticket is blocked by https://github.com/openownership/register/issues/216 .
Given that I wasn't able to deduplicate as much in #216 as I'd hoped, because of the plethora of small differences between the code for each datasource, I don't think it would be straightforward to remove any further dependencies from Register, at present. This is very disappointing, but I'm not sure how much value there is in working on those sections of code further.
Register is dependent on every internal Register-related library, not just Register Common. Perhaps this is necessary, perhaps not. Having it dependent on everything means that despite there being different Transformer and Ingestor projects, Register frequently needs updating as part of that process. Not only that, but it results in more dependencies for Register itself, because of the need to satisfy internal Git libraries in the project itself (in contrast to libraries published to RubyGems).
Investigate whether all these internal dependencies are necessary, or whether anything can be removed (e.g. by moving Register-only structs into that project directly, rather than storing them in a library that part of which doesn't appear to be used anywhere else). However, it might be that these libraries are in fact necessary, because of raw data or similar.
It would be best for this ticket to wait for a broader discussion on the future of cross-repo structure for Register (e.g. Sources SK, Transformer SK, Ingestor SK).
This is a spin-off ticket from https://github.com/openownership/register/issues/188 .