ugent-library / old-people-service

People service
Apache License 2.0
0 stars 0 forks source link

[Student import] Refactor upserting a person in the repository #40

Closed netsensei closed 1 year ago

netsensei commented 1 year ago

Story

A lot of heavy lifting now happens outside of the repository, inside of the cli/students_import.go command. This should be moved inside of a dedicated upsert method in the repository. We can then reuse this method both when fetching data from LDAP as well as GISMO, without repeating ourselves.

Success criteria

Screenshots

n/a

Implementation suggestion

The upsert could be implemented as a set of queries which are executed in a transaction. Or it could be implemented as a single raw query using common table expressions. See OAI service for an example.

Additionally, storing all id's as an array of key/value objects makes it far harder to do lookups. Way easier is a hashmap of arrays. With each array being identified by its type:

{
  "historic_ugent_id": [ 111, 222, 333 ],
  "uzgent_id": [ 444, 555, 666]
}

If we store the id's like this, we can just fetch the corresponding array and use functions like sort.Search or the newer slices.BinarySearch to do lookups; instead of iterating over a range.

Automatic testing scenario

n/a

Additional information

This issue might deprecate #37

Related issues

nicolasfranck commented 1 year ago

By "heavy lifting" you probably mean the assignment of attributes to the person object. That is true, that is repeated across commands. For an "upsert", there is already a method call "SavePerson".

Not sure how to provide clear way to provide new data, at least if the receiving method should only add what is given to it. For example if the receiving method would receive a models.Person, and the first_name is empty, is that because it needs to be empty, or just because it was not filled in (and therefore should not overwrite anything). Such things are clearer within context.

netsensei commented 1 year ago

By "heavy lifting" you probably mean the assignment of attributes to the person object.

All of that should be hidden behind one method repo.UpsertPerson

For an "upsert", there is already a method call "SavePerson".

That's a wrapper around CreatePerson and UpdatePerson. Just a single public repo.UpsertPerson method suffices.

Not sure how to provide clear way to provide new data, at least if the receiving method should only add what is given to it. For example if the receiving method would receive a models.Person, and the first_name is empty, is that because it needs to be empty, or just because it was not filled in (and therefore should not overwrite anything). Such things are clearer within context.

nicolasfranck commented 1 year ago

@netsensei your remark about the other_id is correct. But that requires changing the protobuf definition, which I'm trying to migrate from. Better fixing that afterwards

nicolasfranck commented 1 year ago

I also read that some assignment logic that is now located in "import_students" (and in the nats subscriber for person objects) should be moved inside the repository, so that moving from nats to another system you don't loose that logic.

So in fact this would require adding two methods:

Not sure wether I should embed the LDAP fix inside the repository when updating staff records, as that would mean that the ldap is both inside the repository, and also outside the repository, i.e. when fetching student records. Unless I put the entire ldap and update into a method like SynchronizeStudents(context.Context) like I do in method AutoExpirePeople(context.Context)

nicolasfranck commented 1 year ago

Suggestion: that whole ldap integration should be integrated into the repository: