p2panda / aquadoggo

Node for the p2panda network handling validation, storage, aggregation and replication
GNU Affero General Public License v3.0
69 stars 5 forks source link

Blob created on file system multiple times #574

Closed sandreae closed 11 months ago

sandreae commented 11 months ago

Logs from publishing a multi-piece blob:

[2023-10-05T12:23:57Z INFO  aquadoggo::manager] Start replication service
Node is listening on 0.0.0.0:2022
[2023-10-05T12:24:02Z DEBUG aquadoggo::graphql::queries::next_args] Query to nextArgs received for public key b6110d3028a92b4c16ec98c82b7d3a59e7bfa2ed88caf15e06edbd8230073b0a
[2023-10-05T12:24:03Z DEBUG aquadoggo::graphql::mutations::publish] Query to publish received containing entry with hash 00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer reduce task with input <TaskInput 00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4/-> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Working on <TaskInput 00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4/->
[2023-10-05T12:24:03Z DEBUG aquadoggo::graphql::queries::next_args] Query to nextArgs received for public key b6110d3028a92b4c16ec98c82b7d3a59e7bfa2ed88caf15e06edbd8230073b0a
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Created <Document 0addf4>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Dispatch dependency task for view with id: 00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer dependency task with input <TaskInput -/00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] Working on <TaskInput -/00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] Scheduling 1 tasks
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer blob task with input <TaskInput -/00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::blob] Working on <TaskInput -/00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Worker blob aborted task <QueueItem 0 w. <TaskInput -/00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4>>: Related blob does not exist (yet)
[2023-10-05T12:24:03Z DEBUG aquadoggo::graphql::mutations::publish] Query to publish received containing entry with hash 0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer reduce task with input <TaskInput 0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b/-> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Working on <TaskInput 0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b/->
[2023-10-05T12:24:03Z DEBUG aquadoggo::graphql::queries::next_args] Query to nextArgs received for public key b6110d3028a92b4c16ec98c82b7d3a59e7bfa2ed88caf15e06edbd8230073b0a
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Created <Document 0c2c3b>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Dispatch dependency task for view with id: 0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer dependency task with input <TaskInput -/0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] Working on <TaskInput -/0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] Scheduling 1 tasks
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer blob task with input <TaskInput -/0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::blob] Working on <TaskInput -/0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Worker blob aborted task <QueueItem 1 w. <TaskInput -/0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b>>: Related blob does not exist (yet)
[2023-10-05T12:24:03Z DEBUG aquadoggo::graphql::mutations::publish] Query to publish received containing entry with hash 002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer reduce task with input <TaskInput 002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66/-> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Working on <TaskInput 002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66/->
[2023-10-05T12:24:03Z DEBUG aquadoggo::graphql::queries::next_args] Query to nextArgs received for public key b6110d3028a92b4c16ec98c82b7d3a59e7bfa2ed88caf15e06edbd8230073b0a
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Created <Document 3aaf66>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Dispatch dependency task for view with id: 002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer dependency task with input <TaskInput -/002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] Working on <TaskInput -/002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] Scheduling 1 tasks
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer blob task with input <TaskInput -/002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::blob] Working on <TaskInput -/002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Worker blob aborted task <QueueItem 2 w. <TaskInput -/002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66>>: Related blob does not exist (yet)
[2023-10-05T12:24:03Z DEBUG aquadoggo::graphql::mutations::publish] Query to publish received containing entry with hash 0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer reduce task with input <TaskInput 0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae/-> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Working on <TaskInput 0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae/->
[2023-10-05T12:24:03Z DEBUG aquadoggo::graphql::queries::next_args] Query to nextArgs received for public key b6110d3028a92b4c16ec98c82b7d3a59e7bfa2ed88caf15e06edbd8230073b0a
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Created <Document 026aae>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Dispatch dependency task for view with id: 0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer dependency task with input <TaskInput -/0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] Working on <TaskInput -/0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae>
[2023-10-05T12:24:03Z DEBUG aquadoggo::graphql::mutations::publish] Query to publish received containing entry with hash 0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer reduce task with input <TaskInput 0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75/-> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Working on <TaskInput 0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75/->
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Created <Document 7f0e75>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::reduce] Dispatch dependency task for view with id: 0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer dependency task with input <TaskInput -/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] Working on <TaskInput -/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] Scheduling 2 tasks
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Sending materializer blob task with input <TaskInput -/0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae> to the task queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::worker] Duplicate materializer dependency task already in progress, setting re-queue flag for task with input <TaskInput -/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75> and not adding this task to the queue.
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::blob] Working on <TaskInput -/0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae>
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] Get view for pinned relation with id: 00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4
[2023-10-05T12:24:03Z INFO  aquadoggo::materializer::tasks::blob] Creating blob at path /tmp/.tmpPqPZPL/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4
[2023-10-05T12:24:03Z DEBUG aquadoggo::materializer::tasks::dependency] Get view for pinned relation with id: 0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] Get view for pinned relation with id: 002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] Get view for pinned relation with id: 0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] Scheduling 1 tasks
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::worker] Sending materializer blob task with input <TaskInput -/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75> to the task queue.
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::worker] Sending materializer dependency task with input <TaskInput -/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75> to the task queue.
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::blob] Working on <TaskInput -/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75>
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] Working on <TaskInput -/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75>
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] Get view for pinned relation with id: 00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4
[2023-10-05T12:24:04Z INFO  aquadoggo::materializer::tasks::blob] Creating blob at path /tmp/.tmpPqPZPL/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 00203b06dfcaa36b5e2b2d53886ced5f837288d58888b8a13ede96a75b27e30addf4
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] Get view for pinned relation with id: 0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 0020fbd49a3e15a18461e33d76ae0da8b1d0e882057b6702da5403a18231060c2c3b
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] Get view for pinned relation with id: 002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 002001bd2fc4b710a50b2f837e57a11d1007e527faef741b959ddb1803d5003aaf66
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] Get view for pinned relation with id: 0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 0020fb5efbef270c4524508f3ef15c11deddf84d06b9e73576200cd46fbb2e026aae
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::dependency] Scheduling 1 tasks
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::worker] Sending materializer blob task with input <TaskInput -/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75> to the task queue.
[2023-10-05T12:24:04Z DEBUG aquadoggo::materializer::tasks::blob] Working on <TaskInput -/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75>
[2023-10-05T12:24:04Z INFO  aquadoggo::materializer::tasks::blob] Creating blob at path /tmp/.tmpPqPZPL/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75
sandreae commented 11 months ago

It might only be logged multiple times, haven't actually looked yet.

sandreae commented 11 months ago

It also appears that previously published blobs get re-materialised to the file system on publishing a new one:

[2023-10-05T13:10:52Z INFO  aquadoggo::materializer::tasks::blob] Creating blob at path /tmp/.tmp5gR7YM/00201c05353d50697f25f414399a7d56921f30aa6f03acd3a402d1293ad7cf355c60
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 00200f33683892f5b5baedd8d3195ea9b787c6fdb3cf598619d9a5e7b77f16a99cf8
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::dependency] Scheduling 1 tasks
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::worker] Sending materializer blob task with input <TaskInput -/0020f8d2d049e14a6babfdf11dd15ce3d81c1a87bcf74f089db6a7f7ebbc4fe511b8> to the task queue.
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::blob] Working on <TaskInput -/0020f8d2d049e14a6babfdf11dd15ce3d81c1a87bcf74f089db6a7f7ebbc4fe511b8>
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::worker] Sending materializer blob task with input <TaskInput -/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75> to the task queue.
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::blob] Working on <TaskInput -/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75>
[2023-10-05T13:10:52Z INFO  aquadoggo::materializer::tasks::blob] Creating blob at path /tmp/.tmp5gR7YM/0020f8d2d049e14a6babfdf11dd15ce3d81c1a87bcf74f089db6a7f7ebbc4fe511b8
[2023-10-05T13:10:52Z INFO  aquadoggo::materializer::tasks::blob] Creating blob at path /tmp/.tmp5gR7YM/0020b4c8d8a4e4c66a1fad724fde3318a7e897981335c5a690b8c5c9e52d517f0e75
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 002060ac9914ef72be934ea73f26054ce5d7f584cc9df13b003fa3832b82e5130165
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::dependency] Get view for pinned relation with id: 0020dd6ba77d7ff5b581a837ed8cfd809db38cd04bbcaf89ca0409a0f094d0fb9bd1
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 0020dd6ba77d7ff5b581a837ed8cfd809db38cd04bbcaf89ca0409a0f094d0fb9bd1
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::dependency] Get view for pinned relation with id: 002065b45eae4493f68353ee0444b6af659099ee47c5095f14f806b01be77cd0c982
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 002065b45eae4493f68353ee0444b6af659099ee47c5095f14f806b01be77cd0c982
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::dependency] Get view for pinned relation with id: 00203e2d1eb00a92484e84a66c358984ebf71b5c853b36a482f50c62c0c711b92821
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::dependency] View found for pinned relation: 00203e2d1eb00a92484e84a66c358984ebf71b5c853b36a482f50c62c0c711b92821
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::dependency] Scheduling 1 tasks
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::worker] Duplicate materializer blob task already in progress, setting re-queue flag for task with input <TaskInput -/00201c05353d50697f25f414399a7d56921f30aa6f03acd3a402d1293ad7cf355c60> and not adding this task to the queue.
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::worker] Sending materializer blob task with input <TaskInput -/00201c05353d50697f25f414399a7d56921f30aa6f03acd3a402d1293ad7cf355c60> to the task queue.
[2023-10-05T13:10:52Z DEBUG aquadoggo::materializer::tasks::blob] Working on <TaskInput -/00201c05353d50697f25f414399a7d56921f30aa6f03acd3a402d1293ad7cf355c60>
[2023-10-05T13:10:52Z INFO  aquadoggo::materializer::tasks::blob] Creating blob at path /tmp/.tmp5gR7YM/00201c05353d50697f25f414399a7d56921f30aa6f03acd3a402d1293ad7cf355c60
sandreae commented 11 months ago

Seems to actually be creating the blob again:

https://github.com/p2panda/aquadoggo/blob/3ac8b70d230cf44f29e06a686e1251e1a8df9f17/aquadoggo/src/materializer/tasks/blob.rs#L66-L98

sandreae commented 11 months ago

Maybe the issue comes from us creating a blob whenever a blob_piece_v1 or blob_v1 operation is processed in a blob_task. This causes duplicate creations when all blob pieces were inserted into the store before the tasks were started.

If so, a better flow might be:

With this I believe task de-duplication would take care of things for us.

adzialocha commented 11 months ago

If so, a better flow might be:

This is very similar to how we handle parent relationships, sounds good!

sandreae commented 11 months ago

Hmmm, I don't think the above diagnosis explains this behaviour actually. I get these logs when publishing a new blob with id 00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1. As you can see, actually all blobs are repeatedly re-materialized to the file system.

I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020e9e26e248795e9812c6cbc845ebfc684e51b461d5ca6db307478ee20e9ba805d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020e9e26e248795e9812c6cbc845ebfc684e51b461d5ca6db307478ee20e9ba805d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020e9e26e248795e9812c6cbc845ebfc684e51b461d5ca6db307478ee20e9ba805d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020e9e26e248795e9812c6cbc845ebfc684e51b461d5ca6db307478ee20e9ba805d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020e9e26e248795e9812c6cbc845ebfc684e51b461d5ca6db307478ee20e9ba805d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020e9e26e248795e9812c6cbc845ebfc684e51b461d5ca6db307478ee20e9ba805d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020e9e26e248795e9812c6cbc845ebfc684e51b461d5ca6db307478ee20e9ba805d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020e9e26e248795e9812c6cbc845ebfc684e51b461d5ca6db307478ee20e9ba805d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020e9e26e248795e9812c6cbc845ebfc684e51b461d5ca6db307478ee20e9ba805d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020b1fce8916c1a9b5b9c197b39e32705fc4950d5c8c45727c7f82fd782865e5dfa
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00205f8edadea98824587129216785f9fa9b43548d63f8169cf2037836f4f650648d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020f46a3f945167e17cfb6307d5d8e459dbe2d5e8343abb47e320205447d5f4ca68
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/0020e9e26e248795e9812c6cbc845ebfc684e51b461d5ca6db307478ee20e9ba805d
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/002012b7e9b8b80b1cbcf752ab1734fda6436d669becfb6fb9fb8a3f6505b4e667cd
I/aquadoggo::materializ..( 1844): Creating blob at path /data/user/0/com.p2panda.meli/files/00208084e70d7b975650be74a07c8e9173eec6194b75a7626fc4269ec266614a21e1
sandreae commented 11 months ago

Ah, ok, it's this:

https://github.com/p2panda/aquadoggo/blob/3ac8b70d230cf44f29e06a686e1251e1a8df9f17/aquadoggo/src/materializer/tasks/dependency.rs#L219-L237

We're issuing dependency tasks eagerly for all documents of the task inputs "parent" schema. For blob pieces this means dependency tasks, and then blob tasks, are issued for every blob every time a piece arrives at the node :sweat_smile: .

sandreae commented 11 months ago

I actually can't remember the reason for doing this "eagerly". Thinking about it now, it seems enough to issue dependency tasks for only documents which actually do refer to the task inputs document (meaning we look at values in relation fields rather than the schema id when searching for "parent" documents).

sandreae commented 11 months ago

Something similar to this:

https://github.com/p2panda/aquadoggo/blob/3ac8b70d230cf44f29e06a686e1251e1a8df9f17/aquadoggo/src/db/stores/blob.rs#L323-L369

adzialocha commented 11 months ago

I actually can't remember the reason for doing this "eagerly".

If I understand your point correctly I think we need to keep it that "eagerly" as you otherwise will arrive at missing out third-tier, late-arriving operations not triggering materialization of the documents. https://whimsical.com/materializer-DbV6FUs51pqhouKff7Q546 - It's a complex topic, so might need to be revisited with some time to be 100% sure.

Something similar to this:

https://github.com/p2panda/aquadoggo/blob/3ac8b70d230cf44f29e06a686e1251e1a8df9f17/aquadoggo/src/db/stores/blob.rs#L323-L369

Maybe the issue is that the blob task doesn't have a way to prevent materialization when it already exists?

See for example: https://github.com/p2panda/aquadoggo/blob/main/aquadoggo/src/materializer/tasks/reduce.rs#L132-L134

It's fine to trigger many tasks in our design, they should just cancel early when they detect that they don't need to do the work (again).

sandreae commented 11 months ago

In case there's a misunderstanding, I'll explain the issue I think I see again, don't mean I'm defo right, just to avoid confusion.

It appears that we are issuing dependency tasks for all documents which follow an identified "parent schema". So in the case where we have ParentSchema and ChildSchema with the parent containing a pinned relation to the child, the following flow would occur:

I think we only want to issue dependency tasks for each individual document ~which refers to the current documents' view in a pinned relation (list)~ CORRECTION: which contains an operation which refers to any view of the current document in a pinned relation (list) . As shown here in the diagram: image

sandreae commented 11 months ago

Maybe the issue is that the blob task doesn't have a way to prevent materialization when it already exists?

This would also be a nice improvement :+1:

adzialocha commented 11 months ago

I think we only want to issue dependency tasks for each individual document which refers to the current documents' view in a pinned relation (list). As shown here in the diagram:

I think you will miss out on pinned views not getting materialized if they've get treated out of order.

sandreae commented 11 months ago

It's a complex topic, so might need to be revisited with some time to be 100% sure.

Yep, and we don't need to revisit it now. For the current issue we can solve with the blob task being aware of the fact it is duplicating work.

sandreae commented 11 months ago

(didn't mean to close :sweat_smile: )

sandreae commented 11 months ago

I think we only want to issue dependency tasks for each individual document which refers to the current documents' view in a pinned relation (list). As shown here in the diagram:

I think you will miss out on pinned views not getting materialized if they've get treated out of order.

Hmm, yeh, I see :thinking:

sandreae commented 11 months ago

I think we only want to issue dependency tasks for each individual document which refers to the current documents' view in a pinned relation (list). As shown here in the diagram:

I think you will miss out on pinned views not getting materialized if they've get treated out of order.

Hmm, yeh, I see 🤔

I made a correction in my explanation above.

adzialocha commented 11 months ago

We have a test checking against what I'm concerned about: https://github.com/p2panda/aquadoggo/blob/3ac8b70d230cf44f29e06a686e1251e1a8df9f17/aquadoggo/src/materializer/tasks/dependency.rs#L727 - so maybe it's enough we play with your idea and see if the tests still pass? I honestly don't have the focus and brain to think it through right now (just woke up being sick) 😆

As of the solution, I still don't think it would be enough to make the selection more "smart", the blob tasks still need to be able to cancel themselves if they're doing redundant work, as it would be still possible that they might be triggered in other race-condition scenarios. I'd argue that this is the actual "fix" for this problem, the other might be an "optimization" (if it wouldn't break our solution to the late arriving operations problem).

sandreae commented 11 months ago

Yeh cool, I agree, I'll raise a separate issue for this "optimization" then and implement slightly smarter blob tasks now which avoid doing heavy work.