kerberos-io / agent

An open and scalable video surveillance system for anyone making this world a better and more peaceful place.
https://kerberos.io
MIT License
708 stars 88 forks source link

Weird agent to vault issue. #113

Open Christiaan-Mathu opened 1 year ago

Christiaan-Mathu commented 1 year ago

Hi all, hope you are well.

I have set up our own agents, that looks like its running fine.

We also host our own vault on some of our servers.

I managed to get the agents to upload to the vault and everything looks fine but then after a while when there are a lot of events/videos in a short time the agents starts backing up and then after awhile they completely fail to upload new videos and the logs shows the following

Agent Log: {"level":"info","msg":"UploadKerberosVault: Uploading to Kerberos Vault (https://ourVault.com/api)","time":"2023-08-10T08:20:06+02:00"} {"level":"info","msg":"UploadKerberosVault: Upload started for 1691532014_6-967003_Cam 1_200-200-400-400_276_769.mp4","time":"2023-08-10T08:20:06+02:00"} {"level":"info","msg":"UploadKerberosVault: Upload Failed, file doesn't exists anymore.","time":"2023-08-10T08:20:06+02:00"} {"level":"error","msg":"HandleUpload: UploadKerberosVault: Upload Failed, file doesn't exists anymore.","time":"2023-08-10T08:20:06+02:00"} Please let me know if there is some more info I can give to try and narrow down what it happening 😄

cedricve commented 1 year ago

Thank you for reporting @Christiaan-Mathu. Which Kerberos Agent version are you using?

Christiaan-Mathu commented 1 year ago

No problem @cedricve, I have tried multiple versions and the problem still persists, am currently using the latest dev docker image (kerberos/agent-dev:latest)

image
cedricve commented 1 year ago

Can you run this version 384448d, as this resolved an issue with uploading and cleanup.

Christiaan-Mathu commented 1 year ago

Will do I will let you know if it fixes the problem, thanks in advance 😄

Christiaan-Mathu commented 1 year ago

Oky the majority of the recording are now uploading but there are still some that struggle and saw in the logs that it's getting 500 response from the vault so I checked the vault logs and saw the following error:

2023/08/10 08:04:20 [Recovery] 2023/08/10 - 08:04:20 panic recovered:
[GIN] 2023/08/10 - 08:04:20 | 500 |    2.327857ms |    192.168.10.1 | POST     "/api/storage"
[GIN] 2023/08/10 - 08:04:20 | 500 |    2.360449ms |    192.168.10.1 | POST     "/api/storage"
runtime error: invalid memory address or nil pointer dereference
/usr/local/go/src/runtime/panic.go:260 (0x451515)
/usr/local/go/src/runtime/signal_unix.go:835 (0x4514e5)
/go/pkg/mod/github.com/!eyevinn/mp4ff@v0.35.0/mp4/ftyp.go:85 (0xe98bb0)
/go/src/gitlab.com/kerberos-io/vault/api/mp4/fragmented.go:20 (0xe98ba8)
/go/src/gitlab.com/kerberos-io/vault/api/storage/main.go:279 (0x108aee7)
/go/src/gitlab.com/kerberos-io/vault/api/main.go:235 (0x10cda54)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/context.go:174 (0xb04414)
/go/pkg/mod/github.com/zsais/go-gin-prometheus@v0.1.0/middleware.go:364 (0xb043f5)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/context.go:174 (0x9434c1)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/recovery.go:102 (0x9434ac)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/context.go:174 (0x9425c6)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/logger.go:240 (0x9425a9)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/context.go:174 (0x9434c1)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/recovery.go:102 (0x9434ac)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/context.go:174 (0x9425c6)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/logger.go:240 (0x9425a9)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/context.go:174 (0x941650)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/gin.go:620 (0x9412b8)
/go/pkg/mod/github.com/gin-gonic/gin@v1.9.0/gin.go:576 (0x940dfc)
/usr/local/go/src/net/http/server.go:2947 (0x6eb6eb)
/usr/local/go/src/net/http/server.go:1991 (0x6e6906)
/usr/local/go/src/runtime/asm_amd64.s:1594 (0x46eda0)
cedricve commented 1 year ago

Can you disable mp4 fragmentation? Not sure which cameras your are using but it failed to read. out the fragmented mp4 metadata.

Screenshot 2023-08-10 at 10 40 09

Christiaan-Mathu commented 1 year ago

Hi @cedricve, I have not enabled it its still disabled on the agents?

image
Christiaan-Mathu commented 1 year ago

Here is a sample of my agent with the secrets removed:

kerberos-agent1:
    image: "kerberos/agent:384448d"
    restart: always
    ports:
      - "8081:80"
    environment:
      - AGENT_NAME=Cam 1
      - AGENT_USERNAME=admin
      - AGENT_PASSWORD=pass
      - AGENT_TIMEZONE=Africa/Johannesburg
      - AGENT_CAPTURE_IPCAMERA_RTSP=rtsp://192.168.10.2:554/unicast/c1/s1/live
      - AGENT_CAPTURE_IPCAMERA_SUB_RTSP=rtsp://192.168.10.2:554/unicast/c1/s1/live
      - AGENT_CAPTURE_PRERECORDING=2
      - AGENT_CAPTURE_POSTRECORDING=2
      - AGENT_CAPTURE_MAXLENGTH=5
      - AGENT_CAPTURE_PIXEL_CHANGE=150
      - AGENT_CLOUD=kstorage
      - AGENT_KERBEROSVAULT_URI=https://myvault.com/api
      - AGENT_KERBEROSVAULT_ACCESS_KEY=key
      - AGENT_KERBEROSVAULT_SECRET_KEY=secret
      - AGENT_KERBEROSVAULT_PROVIDER=MyLocalMinio
      - AGENT_KERBEROSVAULT_DIRECTORY=office
      - AGENT_REGION_POLYGON=0,50;330,50;330,0;1280,0;1280,720;0,720
Christiaan-Mathu commented 1 year ago

Hi @cedricve have you had a chance to investigate this problem?

It's quite weird because some of the agents work very well but some just don't work...

I am still getting this error when the failing agents try to upload

Christiaan-Mathu commented 1 year ago

Hey @cedricve, I'll try to break down what I think is happening and please correct me if I'm mistaken.

Here's what I think is going on:

Connection Loss Issue: When the link between the agent and the vault breaks, video backups are queued up and can't be sent for uploading.

Upload Sequence After Reconnection: Once the connection is reestablished, the system starts uploading videos from the backup. It processes the videos in the order they were created, starting with the oldest video. This sequence follows the upload file naming convention.

Auto-Clean Interference: However, there's a problem. The system's auto-clean function sometimes deletes the video files that are being uploaded. This causes the upload to fail.

Retrying After Failure: After a failed upload, the system waits for quite a while before attempting to upload the video again.

A possible solution to this issue could involve modifying a specific loop in the system's code. The current loop structure reads all files in the directory for uploading, which can lead to conflicts with the auto-clean process. Here's a suggested alteration:

Original Loop:

While(True){
    files = readDir()

    if (files != null) {
        for file in files{
            doUpload(file) // The file object might be deleted due to auto cleaning
        }
    } else {
        wait(delay)
    }
}

Proposed Solution:

While(True){
    file = getOneFileInDir()

    if (file != null) {
        doUpload()
    } else {
        wait(delay)
    }
}

In this proposed solution, the system would focus on one file at a time, mitigating the conflict with auto-cleaning. This could help ensure successful video uploads without the interference caused by the auto-clean function.

Please let me know if this explanation aligns with your understanding or if there are any corrections needed.

cedricve commented 1 year ago

Thanks @Christiaan-Mathu for the explanation. So we did fix something in the latest release: https://github.com/kerberos-io/agent/blob/master/machinery/src/cloud/Cloud.go#L114-L141

Going into your previous comment, it actually tells that the video was recorded and recognised as a fragmented mp4.

2023/08/10 08:04:20 [Recovery] 2023/08/10 - 08:04:20 panic recovered:
[GIN] 2023/08/10 - 08:04:20 | 500 |    2.327857ms |    192.168.10.1 | POST     "/api/storage"
[GIN] 2023/08/10 - 08:04:20 | 500 |    2.360449ms |    192.168.10.1 | POST     "/api/storage"
runtime error: invalid memory address or nil pointer dereference
/usr/local/go/src/runtime/panic.go:260 (0x451515)
/usr/local/go/src/runtime/signal_unix.go:835 (0x4514e5)
/go/pkg/mod/github.com/!eyevinn/mp4ff@v0.35.0/mp4/ftyp.go:85 (0xe98bb0)
/go/src/gitlab.com/kerberos-io/vault/api/mp4/fragmented.go:20 (0xe98ba8)
cedricve commented 1 year ago

However I do get your point. At the moment the upload failed due to network connectivity, we will "slow down" for x seconds (https://github.com/kerberos-io/agent/blob/master/machinery/src/cloud/Cloud.go#L136-L139) and try to upload the next one.

On the next run we will check if the video is still available (https://github.com/kerberos-io/agent/blob/master/machinery/src/cloud/KerberosHub.go#L40-L48) and if not it will be marked as to be removedhttps://github.com/kerberos-io/agent/blob/master/machinery/src/cloud/Cloud.go#L131