noi-techpark / bdp-commons

GNU Affero General Public License v3.0
2 stars 12 forks source link

Event A22 DC has a problem with time series IDs, which should be multiple and hence create a flow of connected events #426

Closed Piiit closed 2 years ago

Piiit commented 2 years ago

Related to #263

Original request, copied from a comment by @rcavaliere :

@Piiit I am not sure that we still have the desired mapping as far as the A22 events data is concerned. I verified with A22, and they confirm that in case of a queue we should have something like that: `

ID Data versione Sottotipo evento Km inizio Località inizio Km fine Località fine
496655 01/30/2022 16:15:54 Traffico rallentato 157,900 ROVERETO NORD 206,700 AFFI
496655 01/30/2022 16:31:58 Traffico rallentato 131,400 TRENTO NORD 206,700 AFFI
496655 01/30/2022 16:59:34 Traffico rallentato con code 131,400 TRENTO NORD 206,700 AFFI
496655 01/30/2022 19:33:01 Traffico rallentato con code 157,900 ROVERETO NORD 206,700 AFFI
496655 01/30/2022 19:59:05 Traffico rallentato 157,900 ROVERETO NORD 206,700 AFFI
496655 01/30/2022 20:20:20 Traffico rallentato 179,100 ALA/AVIO 206,700 AFFI
496655 01/30/2022 20:39:34 Traffico rallentato 179,100 ALA/AVIO 206,700 AFFI`

The ID field exposed by A22 should be stored 1:1 into the field event_series_uuid (table event), I see however that at present we do some kind of hashing, as for the uuid field. Can you please check what the Data Collector is doing? In case we can ask Catch&Solve to change this if it was implemented in a different way than what originally indicated.

Piiit commented 2 years ago

@rcavaliere It works... please check:

I tested it with:

select * from event e 
join event f on e.event_series_uuid = f.event_series_uuid 
where e.id <> f.id;

We have multiple entries for events series with name equal to 499559 for example.

The ID field exposed by A22 should be stored 1:1 into the field event_series_uuid

The exposed event ID is not a UUID, therefore I cannot store it there. This field is meant to check if a certain event series has already be inserted. The event ID itself is stored as the current event name. We need to model a generic event table here, therefore we cannot map this 1:1 into the event_series_uuid field. That field is similar as the event_uuid meant to check for uniqueness without the need to query many records from the DB.

However, we could add another name field for event series, which are domain specific, where we could then store whatever we want. I suggest to call it event_series_name.

What do you think?

Piiit commented 2 years ago

@rcavaliere Another thing is that we have only coordinates and no localita', do we need that?

rcavaliere commented 2 years ago

@Piiit thanks for the update. That's good, if we have the field name in which we store the event ID it's in my opinion sufficient. We don't need another field event_series_name, and we don't need località as well. I would say that we are able to put the DC in production, what do you think?

Piiit commented 2 years ago

@rcavaliere I was wondering why we have some missing events, and found this comment inside the code (nothing in the README unfortunately):

        // the session will last 24 hours unless de-authenticated before - however, if a user
        // deauthenticates one session, all sessions of the same user will be de-authenticated;
        // this means each running application neeeds their own username

Now, since we have the same credentials on the staging environment and during my local testing, it is possible that the DC de-authenticated running sessions, that could lead to data loss. Would it be possible to get another pair of credentials for the testing environments?

I will in the meantime deactivate all testing installations, and only run the production DC

rcavaliere commented 2 years ago

@Piiit yes I agree. What's important is that now the DC has the expected behavior. How many credentials do we need? For all A22 DCs?

Piiit commented 2 years ago

Would say two for each DC if possible

rcavaliere commented 2 years ago

OK, can you please check in Jenkins which set of credentials is currently used for which DC? The username should be something "BNx" where x is a number. The DC are the following:

I will then ask A22 for six more credentials, to be used in the testing environment