Closed rcavaliere closed 1 year ago
I just saw now that the "messageType" field is different.
One record has the type Trasporti pubblici - Öffentliche Verkehrsmittel
and the other one is Situazione attuale - Aktuelle Lage
If you run this query you can see that the messageType in the metadata of the two similar records is different.
select e.name, m.json->'messageTypeDescIt',m.json->'messageTypeDescDe',m.json->'messageTypeId'
from event e
join metadata m on e.meta_data_id = m.id
where e.origin = 'PROVINCE_BZ'
and e.category = 'evento eccezionale - caso particolare | Sonderfälle'
and e.event_interval = '["2022-12-23 00:00:00",)'
order by e.id desc
Is this then correct how its is now or should I combine both metadata, to have only one record? Probably we need to discuss, on how to combine them.
@dulvui ok, at least this explains this situation and it's good to hear that we don't have issues at the Data Collector. On the other side, as you say, this might not be the intended result we want to have. Let me investigate further, then let's decide how to proceed
@dulvui let's discuss this together. We need to deepen the logic with which the Data Collector considers an event unique. If I remember well, the point is in the way the value events_series_uuid is calculated
@rcavaliere Okay, I will check in the data collector, how this uuid is composed before our meeting
@dulvui I think the bug is still present. Check for example through analytics, and set this configuration:
I see for example the event " Bei Gfrill (km 9,950 - km 10,050) " 6 times! Can you find these records in the database, and understand, why we have 6 records for the same event?
@rcavaliere I checked now and there are some differences between the records.
The fields messageId
, acutalMail
and publisherDateTime
are different for this 3 records.
In the uuid the filed messageId
is used, so removing that field, might solve the problem.
Should I change the datacollector, so that this events get merged into one, by removing messageId
from uuid?
@dulvui yes, let's try this!
@rcavaliere I checked now again and the last changes I made 3 weeks ago (31.03.2023) by removing messageId from the uuid field worked. Now there are no duplicate events anymore, if you query events starting from the date 01.04.2023.
Now the fields that compose the uuid are beginDate
, endDate
, lognitue
and latitude
. So only if one of this fields changes, a new event is created.
Here a query to verify that there are no duplicates:
select e.created_on, e.description, m.json -> 'placeIt', l.geometry, event_interval , e.uuid from "event" e
join metadata m on e.meta_data_id = m.id
join "location" l on e.location_id = l.id
where e.origin = 'PROVINCE_BZ'
and e.created_on > '2023-04-01 00:00:00.000'
order by m.json -> 'placeIt' desc
On analytics its a bit difficult to see, because there are events created before the 01.04.2023 still showing up now. So there are still duplicates, but they where created before the change.
I found only one duplicate now, where the position of the event slightly changed. You can find that entry with this query:
select e.created_on, e.description, m.json -> 'placeIt', l.geometry, event_interval , e.uuid from "event" e
join metadata m on e.meta_data_id = m.id
join "location" l on e.location_id = l.id
where e.origin = 'PROVINCE_BZ'
and e.id in (3016591, 3016590)
@dulvui that's very good! If you go on analytics and check the data on the map you feel that now the quality is much better and more "realistic". I have to check better, but I think that we have also some issues in the categorization of the events, check for example this
@dulvui I think that for this Data Collector we are now fine with the data stream, but we have issues in the visualization of the information on analytics. Together with the visualization of the map (see comment above), also the events in the tab view has something to correct.
My suggestion is to close this issue and to open a new one for the improvements on analytics
We have duplicate events for the Province BZ Traffic Events Data Collector.
Check for example this:
select * from intimev2.event where origin = 'PROVINCE_BZ' and category = 'evento eccezionale - caso particolare | Sonderfälle' and event_interval = '["2022-12-23 00:00:00",)' order by id desc
we obtain 4 events, but actually we should have just 2.
We need to investigate why we create 4 events instead of 2, and fix the problem so that each single event has one entry. If the event is updated in its evolution we should have multiple metadata records associated to it (no multiple events in the events table). We could rethink how the data is currently stored in the Open Data Hub