tgdrive / teldrive

Telegram Drive
MIT License
1.8k stars 266 forks source link

[QUESTION/BUG] How does duplication checks work with chunks? (SQLSTATE 23505) #231

Closed iwconfig closed 4 months ago

iwconfig commented 5 months ago

Hello again!

I get the SQLSTATE 23505 error again, but only when uploading chunks of big files. It might be a bug, possibly related to #229, but I also have a question:

Is there any hash comparison before uploading? In the database I find id, upload_id and other kinds of ids, but I don't know how valid they are fingerprints. I would assume they're just uuids. Ideally teldrive should compare files before uploading anything, but that's pretty obvious, so maybe I'm missing something, or it is just yet to be implemented?

teldrive  | 26/05/2024 06:53 PM INFO            {"status": 200, "method": "GET", "path": "/api/uploads/e0087fe633c192ce07a18dc3f15dd309", "query": "", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 0.440182901, "time": "2024-05-26T18:53:39Z"}
teldrive  | 26/05/2024 06:53 PM DEBUG   uploading file  {"fileName": "upload_test", "partName": "933c190954aba8deb5a69c3604cc6c46", "bot": "6493112858", "botNo": 0, "chunkNo": 2, "partSize": 25712000}
teldrive  | 26/05/2024 06:53 PM DEBUG   uploading file  {"fileName": "upload_test", "partName": "599963f1f1d4f80e3dd8d44601d88e8c", "bot": "7192771920", "botNo": 1, "chunkNo": 1, "partSize": 524288000}
teldrive  | 26/05/2024 06:53 PM DEBUG   upload finished {"fileName": "upload_test", "partName": "933c190954aba8deb5a69c3604cc6c46", "chunkNo": 2}
teldrive  | 26/05/2024 06:53 PM INFO            {"status": 201, "method": "POST", "path": "/api/uploads/e0087fe633c192ce07a18dc3f15dd309", "query": "channelId=0&encrypted=false&fileName=upload_test&partName=933c190954aba8deb5a69c3604cc6c46&partNo=2", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 1.197726102, "time": "2024-05-26T18:53:40Z"}
teldrive  | 26/05/2024 06:54 PM DEBUG   upload finished {"fileName": "upload_test", "partName": "599963f1f1d4f80e3dd8d44601d88e8c", "chunkNo": 1}
teldrive  | 26/05/2024 06:54 PM INFO            {"status": 201, "method": "POST", "path": "/api/uploads/e0087fe633c192ce07a18dc3f15dd309", "query": "channelId=0&encrypted=false&fileName=upload_test&partName=599963f1f1d4f80e3dd8d44601d88e8c&partNo=1", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 29.539065563, "time": "2024-05-26T18:54:08Z"}
teldrive  | 26/05/2024 06:54 PM ERROR   [DB] github.com/divyam234/teldrive/pkg/services/file.go:101 ERROR: duplicate key value violates unique constraint "unique_file" (SQLSTATE 23505)
teldrive  | [128.714ms] [rows:0] INSERT INTO "teldrive"."files" ("name","type","mime_type","path","size","starred","depth","category","encrypted","user_id","status","parent_id","parts","channel_id") VALUES ('upload_test','file','application/octet-stream','',550000000,false,NULL,'other',false,6907387205,'active','nCkUeR9W6YwAGwVH','[{"id":16355},{"id":16354}]',2052950520) RETURNING "id","created_at","updated_at"
teldrive  | 26/05/2024 06:54 PM ERROR   &{key conflict 409}
teldrive  | 26/05/2024 06:54 PM ERROR   key conflict
teldrive  | 26/05/2024 06:54 PM INFO            {"status": 409, "method": "POST", "path": "/api/files", "query": "", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 0.524054296, "time": "2024-05-26T18:54:09Z"}
teldrive  | 26/05/2024 06:54 PM INFO            {"status": 200, "method": "GET", "path": "/api/uploads/e0087fe633c192ce07a18dc3f15dd309", "query": "", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 0.449969356, "time": "2024-05-26T18:54:19Z"}
teldrive  | 26/05/2024 06:54 PM ERROR   [DB] github.com/divyam234/teldrive/pkg/services/file.go:101 ERROR: duplicate key value violates unique constraint "unique_file" (SQLSTATE 23505)
teldrive  | [124.358ms] [rows:0] INSERT INTO "teldrive"."files" ("name","type","mime_type","path","size","starred","depth","category","encrypted","user_id","status","parent_id","parts","channel_id") VALUES ('upload_test','file','application/octet-stream','',550000000,false,NULL,'other',false,6907387205,'active','nCkUeR9W6YwAGwVH','[{"id":16355},{"id":16354}]',2052950520) RETURNING "id","created_at","updated_at"
teldrive  | 26/05/2024 06:54 PM ERROR   &{key conflict 409}
teldrive  | 26/05/2024 06:54 PM ERROR   key conflict
teldrive  | 26/05/2024 06:54 PM INFO            {"status": 409, "method": "POST", "path": "/api/files", "query": "", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 0.50828225, "time": "2024-05-26T18:54:20Z"}
teldrive  | 26/05/2024 06:54 PM INFO            {"status": 200, "method": "GET", "path": "/api/uploads/e0087fe633c192ce07a18dc3f15dd309", "query": "", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 0.456655813, "time": "2024-05-26T18:54:40Z"}
teldrive  | 26/05/2024 06:54 PM ERROR   [DB] github.com/divyam234/teldrive/pkg/services/file.go:101 ERROR: duplicate key value violates unique constraint "unique_file" (SQLSTATE 23505)
teldrive  | [126.208ms] [rows:0] INSERT INTO "teldrive"."files" ("name","type","mime_type","path","size","starred","depth","category","encrypted","user_id","status","parent_id","parts","channel_id") VALUES ('upload_test','file','application/octet-stream','',550000000,false,NULL,'other',false,6907387205,'active','nCkUeR9W6YwAGwVH','[{"id":16355},{"id":16354}]',2052950520) RETURNING "id","created_at","updated_at"
teldrive  | 26/05/2024 06:54 PM ERROR   &{key conflict 409}
teldrive  | 26/05/2024 06:54 PM ERROR   key conflict
teldrive  | 26/05/2024 06:54 PM INFO            {"status": 409, "method": "POST", "path": "/api/files", "query": "", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 0.508920102, "time": "2024-05-26T18:54:41Z"}
teldrive  | 26/05/2024 06:55 PM INFO            {"status": 200, "method": "GET", "path": "/api/uploads/e0087fe633c192ce07a18dc3f15dd309", "query": "", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 0.446628495, "time": "2024-05-26T18:55:21Z"}
teldrive  | 26/05/2024 06:55 PM ERROR   [DB] github.com/divyam234/teldrive/pkg/services/file.go:101 ERROR: duplicate key value violates unique constraint "unique_file" (SQLSTATE 23505)
teldrive  | [129.782ms] [rows:0] INSERT INTO "teldrive"."files" ("name","type","mime_type","path","size","starred","depth","category","encrypted","user_id","status","parent_id","parts","channel_id") VALUES ('upload_test','file','application/octet-stream','',550000000,false,NULL,'other',false,6907387205,'active','nCkUeR9W6YwAGwVH','[{"id":16355},{"id":16354}]',2052950520) RETURNING "id","created_at","updated_at"
teldrive  | 26/05/2024 06:55 PM ERROR   &{key conflict 409}
teldrive  | 26/05/2024 06:55 PM ERROR   key conflict
teldrive  | 26/05/2024 06:55 PM INFO            {"status": 409, "method": "POST", "path": "/api/files", "query": "", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 0.514928695, "time": "2024-05-26T18:55:22Z"}
teldrive  | 26/05/2024 06:56 PM INFO            {"status": 200, "method": "GET", "path": "/api/uploads/e0087fe633c192ce07a18dc3f15dd309", "query": "", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 0.444060396, "time": "2024-05-26T18:56:42Z"}
teldrive  | 26/05/2024 06:56 PM ERROR   [DB] github.com/divyam234/teldrive/pkg/services/file.go:101 ERROR: duplicate key value violates unique constraint "unique_file" (SQLSTATE 23505)
teldrive  | [137.737ms] [rows:0] INSERT INTO "teldrive"."files" ("name","type","mime_type","path","size","starred","depth","category","encrypted","user_id","status","parent_id","parts","channel_id") VALUES ('upload_test','file','application/octet-stream','',550000000,false,NULL,'other',false,6907387205,'active','nCkUeR9W6YwAGwVH','[{"id":16355},{"id":16354}]',2052950520) RETURNING "id","created_at","updated_at"
teldrive  | 26/05/2024 06:56 PM ERROR   &{key conflict 409}
teldrive  | 26/05/2024 06:56 PM ERROR   key conflict
teldrive  | 26/05/2024 06:56 PM INFO            {"status": 409, "method": "POST", "path": "/api/files", "query": "", "ip": "172.19.0.1", "user-agent": "rclone/v1.66.3", "latency": 0.539203507, "time": "2024-05-26T18:56:43Z"}
divyam234 commented 5 months ago

@iwconfig you can see unique fingerprint here https://github.com/divyam234/teldrive/blob/6b51c2343743ddd2e49a9f1d4ed9cc0f28b800aa/internal/database/migrations/20231102165658_tables.sql#L78 so files name with same name in same directory are not allowed to keep it consistent with windows and linux fs.

iwconfig commented 5 months ago

Oh, I see!

Is it a limitation in rclone then? If I upload the same file again in the UI it tells me the file already exist without proceeding with the upload, but not if done via rclone.

aniel300 commented 5 months ago

@iwconfig you can see unique fingerprint here

https://github.com/divyam234/teldrive/blob/6b51c2343743ddd2e49a9f1d4ed9cc0f28b800aa/internal/database/migrations/20231102165658_tables.sql#L78

so files name with same name in same directory are not allowed to keep it consistent with windows and linux fs.

could this breake rclone union?

iwconfig commented 5 months ago

so files name with same name in same directory are not allowed to keep it consistent with windows and linux fs.

could this breake rclone union?

I don't think so. teldrive has nothing to do with how rclone handles the union backend, and cannot do anything to it unless there's some sort of config parameter set in the rclone config such as --teldrive-this-is-part-of-a-union=true or something to which teldrive would do some magic tricks. But even that falls short.

To my knowledge, rclone union operates on a fifo or lifo basis, in the order the remotes are specified in the union config. Only one file wins the race. I could be wrong about this though.

Why not test it? Doesn't have to be with one or more teldrive remotes. Local filesystem backends works the same in this regard.

aniel300 commented 5 months ago

@iwconfig am testing two remote aside from teldrive one and it doesnt work (am having similar error to what u shared here) but as soon i take out teldrive remote from the picture and use the official rclone 1.67 beta build then it works fine.

iwconfig commented 5 months ago

@iwconfig am testing two remote aside from teldrive one and it doesnt work (am having similar error to what u shared here) but as soon i take out teldrive remote from the picture and use the official rclone 1.67 beta build then it works fine.

Well that sounds like it's because divyam234/rclone version is at version 1.66.0 upstream, not 1.67.0. While 1.67.0 is still in beta, I'd suggest you wait for that upstream version to be released and later merged into divyam234/rclone.

Version 1.67.0 seems to have fixed the IO issue causing this message (1.66.0):

2024/05/27 13:36:41 ERROR : test: ReadFileHandle.Flush error: corrupted on transfer: md5 hash differ "b1ddf0d5af549664d16d85e70a2a20a4" vs "f872a5c8ee1d788a1ea97c29d51d32c6"
2024/05/27 13:36:41 ERROR : IO error: corrupted on transfer: md5 hash differ "b1ddf0d5af549664d16d85e70a2a20a4" vs "f872a5c8ee1d788a1ea97c29d51d32c6"
2024/05/27 13:36:41 DEBUG : &{test (r)}: >Flush: err=corrupted on transfer: md5 hash differ "b1ddf0d5af549664d16d85e70a2a20a4" vs "f872a5c8ee1d788a1ea97c29d51d32c6"
2024/05/27 13:36:41 DEBUG : &{test (r)}: Release:
2024/05/27 13:36:41 DEBUG : test: ReadFileHandle.Release closing
2024/05/27 13:36:41 ERROR : test: ReadFileHandle.Release error: corrupted on transfer: md5 hash differ "b1ddf0d5af549664d16d85e70a2a20a4" vs "f872a5c8ee1d788a1ea97c29d51d32c6"
2024/05/27 13:36:41 ERROR : IO error: corrupted on transfer: md5 hash differ "b1ddf0d5af549664d16d85e70a2a20a4" vs "f872a5c8ee1d788a1ea97c29d51d32c6"
2024/05/27 13:36:41 DEBUG : &{test (r)}: >Release: err=corrupted on transfer: md5 hash differ "b1ddf0d5af549664d16d85e70a2a20a4" vs "f872a5c8ee1d788a1ea97c29d51d32c6"

So it's no wonder it doesn't work with teldrive remotes using divyam234/rclone. I don't think hash checking is necessary, unless you want both of the different files with the same name to show up, then you're out of luck.

Here's what I meant by having the same filenames with different content in different remotes in a union mount.

iwconfig@rpi:/tmp $ rclone version
rclone v1.67.0-beta.7975.2257c0339
- os/version: raspbian 11.9 (64 bit)
- os/kernel: 6.1.21-v8+ (aarch64)
- os/type: linux
- os/arch: arm64 (ARMv8 compatible)
- go/version: go1.22.3
- go/linking: static
- go/tags: none
iwconfig@rpi:/tmp $ df -h /tmp/rclonetestdir3
Filesystem      Size  Used Avail Use% Mounted on
uniontest:       59G   53G  5.5G  91% /tmp/rclonetestdir3
iwconfig@rpi:/tmp $ ls rclonetestdir{1..3}/
rclonetestdir1/:
test

rclonetestdir2/:
test

rclonetestdir3/:
test
iwconfig@rpi:/tmp $ head rclonetestdir{1..3}/test
==> rclonetestdir1/test <==
hello1

==> rclonetestdir2/test <==
hello2

==> rclonetestdir3/test <==
hello1
iwconfig@rpi:/tmp $ echo hello1 again > /tmp/rclonetestdir1/test
iwconfig@rpi:/tmp $ head rclonetestdir{1..3}/test
==> rclonetestdir1/test <==
hello1 again

==> rclonetestdir2/test <==
hello2

==> rclonetestdir3/test <==
hello1 again
iwconfig@rpi:/tmp $ echo hello2 again > /tmp/rclonetestdir2/test
iwconfig@rpi:/tmp $ head rclonetestdir{1..3}/test
==> rclonetestdir1/test <==
hello1 again

==> rclonetestdir2/test <==
hello2 again

==> rclonetestdir3/test <==
hello1 again

Only the first file from rclonetestdir1 ends up in rclonetestdir3, because it was the first remote set to the --union-upstreams option:

[local1]
type = local
[local2]
type = local
[uniontest]
type = union
upstreams = local1:/tmp/rclonetestdir1 local2:/tmp/rclonetestdir2
xd003 commented 5 months ago

Is it a limitation in rclone then? If I upload the same file again in the UI it tells me the file already exist without proceeding with the upload, but not if done via rclone.

Can confirm this. @divyam234 it would be great if rclone can check for existing files before uploading itself. I think at the moment, it checks for the same after uploading the chunks. It would save quite a lot of time

divyam234 commented 5 months ago

Rclone already check for files duplication if you have same filename and diff size fingerprint will be diff so change the finger print to filename only as duplicates are not allowed in teldrive

aniel300 commented 5 months ago

Rclone already check for files duplication if you have same filename and diff size fingerprint will be diff so change the finger print to filename only as duplicates are not allowed in teldrive

what would be the flag for this? i cant seem to be able to find it

ben-ba commented 5 months ago

no flags are needed.