thingsboard / thingsboard-edge

Apache License 2.0
101 stars 76 forks source link

Edge instance not syncing device state correctly #127

Open aistisdev opened 3 days ago

aistisdev commented 3 days ago

Describe the bug One particular device on parent thingsboard instance will not sync correctly with edge no matter what is done. We have tried restarting all services, manual syncing after deleting and recreating the device. The device would not be deleted on edge instance. After that we deleted it from the edge database manually. This helped, but now we have a problem where if we create the same device, it automatically appends a suffix:

1) Create device 07332076 on parent thingsboard 2) It turns into 07332076_utQtmcKllgxPXoz on both parent and edge 3) The original 07332076 is not present

This happens no matter how many times we recreated it. For all other device names this does not seem to happen.

Your Server Environment Deployment: monolith Deployment type: k8s ThingsBoard Version: thingsboard/tb-edge-pe:3.6.4EDGEPE Community or Professional Edition: Professional Edition OS Name and Version: NAME="AlmaLinux" VERSION="9.3 (Shamrock Pampas Cat)"

Expected behavior Should create device 07332076 on both thingsboard and edge without any suffixes.

To Reproduce Don't know how to reproduce it , but seems like this started when one of the users imported the device without attaching it to the edge group, and the edge started posting

Screenshots image

Additional context At the same time we also had problems with new user syncing where edge instance kept having: Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "tb_user_email_key" exception. After deleting the user from the edge database manually, only the described problem with the device 07332076 persisted.

1) Is there some way to make edge sync this correctly? 2) Is it possible to prevent edge to sync new devices with parent, if the devices are already in parent, but not in the edge group? This has caused us many, many problems, where someone forgets to add the devices to the edge group and then we have a bunch of "new" devices with these random suffixes.

AndriiLandiak commented 2 days ago

Hello, @aistisdev.

Thanks for bringing the problem to our attention. The suffix is added in case 2 devices with the same name are created separately on both Thingsboard and Edge (during disconnection or not assigning a device with such name to Edge and creating there the same one).

If I understood correctly, there was a device with the name 07332076 on TB, but it wasn't assigned to the Edge group, so when you create the same device on Edge - it creates the device in the correct group, but with suffix? I was able to reproduce this, but this logic is by design, we cannot have 2 devices with the same name, but in different groups.

In my case, I was able to delete that device (07332076) from TB and recreate it (or just delete the device with a suffix and the original one - 07332076 assign to Edge group), no suffix was added. So could you provide some additional screens, etc., if the problem still exists?

aistisdev commented 2 days ago

Hi, @AndriiLandiak

If I understood correctly, there was a device with the name 07332076 on TB, but it wasn't assigned to the Edge group, so when you create the same device on Edge - it creates the device in the correct group, but with suffix?

It's a bit convoluted, but I will try to sketch out the situation:

1) User imported device 07332076 into thingsboard, did not assign any edge group 2) User configured deivce 07332076 to publish to thingsboard edge and device started publishing 3) In the morning we had multiple 07332076_... with random suffixes 4) I deleted the devices with suffixes

Now is the part that is convoluted...

5) After deleting the devices with suffixes the original 07332076 also disappeared (I am not 100% sure that I did not accidentally deleted it, but I don't think I did) 6) I tried recreating the device 07332076 and what ended up happening is:

Thingsboard: image

Thingsboard edge: image

The correct original device with 07332076 name was on edge only, and the 07332076_... device with the suffix was both on edge and thingsboard.

7) I could only delete the one with the suffix, I could not delete the original name without suffix from the edge via UI. 8) I then tried restarting both services, manually syncing edge etc. Nothing helped, when creating 07332076 It automatically appeared as 07332076 ... in thingsboard and edge. 9) I manually deleted 07332076 from the edge database 10) After creating 07332076 again, it automatically created 07332076... in edge and thingsboard, without 07332076 being present anywhere.

This was the case until today. When I started to write this post and take screenshots, the situation fixed itself.... I can now recreate that device correctly with any suffixes. It seems to me that something was cached somewhere and this kept on happening until today when it magically does not happen anymore...

I was able to reproduce this, but this logic is by design, we cannot have 2 devices with the same name, but in different groups.

I understand. Is it possible to configure edge to not publish anything if the device in thingsboard is not included in edge group? Because if we start deploying thousands of devices to edge and those devices are not added to the edge group, there would be an insane amount of trash devices with suffixes, which would then need to be debugged and deleted.

AndriiLandiak commented 2 days ago

User configured deivce 07332076 to publish to thingsboard edge and device started publishing

I am a bit confused, what does it mean - publish to thingsboard edge? Added to device group, that is assigned to Edge or Edge All group, yeah?

This was the case until today...

There could be different reasons for that. As one of the example, a lot of events are present in DB with 07332076 name, which applies renaming but was not processed yet, or something else. In order, you could reproduce it - contact us again!

I understand. Is it possible to configure edge to not publish anything if the device in thingsboard is not included in edge group? Because if we start deploying thousands of devices to edge and those devices are not added to the edge group, there would be an insane amount of trash devices with suffixes, which would then need to be debugged and deleted.

As for now, there are no such options. We could consider improving this in the next release. For example, add some logic for the user to choose - either he wants to create a device with a suffix or replace an existing one. Or some other approach.

aistisdev commented 1 day ago

I am a bit confused, what does it mean - publish to thingsboard edge? Added to device group, that is assigned to Edge or Edge All group, yeah?

It sends telemetry to our parser, which then sends the telemetry as a standard mqtt gateway api message to thingsboard edge. So it just means that the device started sending messages to our endpoint (it was idle before that).

There could be different reasons for that. As one of the example, a lot of events are present in DB with 07332076 name, which applies renaming but was not processed yet, or something else. In order, you could reproduce it - contact us again!

Yeah, we also had some new user email syncing issues at the same time, which was resolved by manually removing the user from the edge database and recreating the user in thingsboard after that. Maybe that had something to do with this issue. I will keep track of these issues if they appear again.

As for now, there are no such options.

In our case it makes more sense to just leave the device inactive in thingsboard or both thingsboard and edge if the device is not included in the edge group. We can just have alarms based on inactivity, and then administrator of devices would solve the issue as needed. Seems like this feature of suffixed devices is targeted more for situations where data loss is a critical issue, but in our field a few lost messages is usually not critical. Also, we don't have any way to deal with the data from suffixed devices and to add to that, for each new message a new suffixed device is created, which makes it very confusing for administrators, especially when there can be a large amount of such devices.

We could consider improving this in the next release. For example, add some logic for the user to choose - either he wants to create a device with a suffix or replace an existing one. Or some other approach.

That would be great!