netsensei commented 1 year ago

Story

A lot of heavy lifting now happens outside of the repository, inside of the cli/students_import.go command. This should be moved inside of a dedicated upsert method in the repository. We can then reuse this method both when fetching data from LDAP as well as GISMO, without repeating ourselves.

Success criteria

[ ] Replace the complex logic to create / update a person by a dedicated upsert method that does all of the heavy lifting.

Screenshots

n/a

Implementation suggestion

The upsert method accepts a set of fields containing fresh data that needs to be upserted as a person.
- could be a models.Person
The upsert checks if a person exists in the database, and fetches the record if it does.
- It uses the historic_ugent_id to do this.
Relevant fields from the fetched existing person are copied into the set of fields with the new data:
- id
- date_created
- date_updated
- active
Update the existing record with the updated values
If the record doesn't exist, it just creates a new record with the set of fresh data

The upsert could be implemented as a set of queries which are executed in a transaction. Or it could be implemented as a single raw query using common table expressions. See OAI service for an example.

Additionally, storing all id's as an array of key/value objects makes it far harder to do lookups. Way easier is a hashmap of arrays. With each array being identified by its type:

{
  "historic_ugent_id": [ 111, 222, 333 ],
  "uzgent_id": [ 444, 555, 666]
}

If we store the id's like this, we can just fetch the corresponding array and use functions like sort.Search or the newer slices.BinarySearch to do lookups; instead of iterating over a range.

Automatic testing scenario

n/a

Additional information

This issue might deprecate #37

Related issues

31

nicolasfranck commented 1 year ago

By "heavy lifting" you probably mean the assignment of attributes to the person object. That is true, that is repeated across commands. For an "upsert", there is already a method call "SavePerson".

Not sure how to provide clear way to provide new data, at least if the receiving method should only add what is given to it. For example if the receiving method would receive a models.Person, and the first_name is empty, is that because it needs to be empty, or just because it was not filled in (and therefore should not overwrite anything). Such things are clearer within context.

netsensei commented 1 year ago

By "heavy lifting" you probably mean the assignment of attributes to the person object.

assignment of attributes
checking if a person object already exists
creating or updating an object

All of that should be hidden behind one method repo.UpsertPerson

For an "upsert", there is already a method call "SavePerson".

That's a wrapper around CreatePerson and UpdatePerson. Just a single public repo.UpsertPerson method suffices.

Not sure how to provide clear way to provide new data, at least if the receiving method should only add what is given to it. For example if the receiving method would receive a models.Person, and the first_name is empty, is that because it needs to be empty, or just because it was not filled in (and therefore should not overwrite anything). Such things are clearer within context.

You could pass a *models.Person object to the repo.UpsertPerson method at this time.
If a struct field isn't initialized, the underlying zero value of the type will be used by the compiler. Use the nillable property in ent to handle empty / non-existing data at the db level.

nicolasfranck commented 1 year ago

@netsensei your remark about the other_id is correct. But that requires changing the protobuf definition, which I'm trying to migrate from. Better fixing that afterwards

nicolasfranck commented 1 year ago

I also read that some assignment logic that is now located in "import_students" (and in the nats subscriber for person objects) should be moved inside the repository, so that moving from nats to another system you don't loose that logic.

So in fact this would require adding two methods:

UpsertStudent(context.Context, models.Person): (models.Person, error)
UpsertStaff(context.Context, models.Person): (models.Person, error)

Not sure wether I should embed the LDAP fix inside the repository when updating staff records, as that would mean that the ldap is both inside the repository, and also outside the repository, i.e. when fetching student records. Unless I put the entire ldap and update into a method like SynchronizeStudents(context.Context) like I do in method AutoExpirePeople(context.Context)

nicolasfranck commented 1 year ago

Suggestion: that whole ldap integration should be integrated into the repository:

every staff member imported from GISMO needs a LDAP fix
every student returned from the ugentldap.SearchPeople method is now a "dummy person" because it contains attributes that need to interpreted differently (e.g. organizations[0].id is "CA20" instead of the real id because the ugentldap cannot use the repository to look for the organization identifier).

ugent-library / old-people-service

[Student import] Refactor upserting a person in the repository #40

Story

Success criteria

Screenshots

Implementation suggestion

Automatic testing scenario

Additional information

Related issues

31