oras-project / oras-go

ORAS Go library
https://oras.land
Apache License 2.0
179 stars 96 forks source link

feat: GC on oci.Store.Delete #653

Closed wangxiaoxuan273 closed 9 months ago

wangxiaoxuan273 commented 10 months ago

Part of #472, this PR implements recursive GC for oci.Store. A field AutoGarbageCollection of oci.Store is added, default value is true.

codecov-commenter commented 10 months ago

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (1d9ad6c) 75.36% compared to head (b454af9) 75.41%.

Files Patch % Lines
content/oci/oci.go 76.74% 7 Missing and 3 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #653 +/- ## ========================================== + Coverage 75.36% 75.41% +0.05% ========================================== Files 59 59 Lines 5570 5606 +36 ========================================== + Hits 4198 4228 +30 - Misses 1011 1015 +4 - Partials 361 363 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

wangxiaoxuan273 commented 10 months ago

I just noticed that this PR does not consider the GC on referrers.

For example, consider the following graph:

graph TD;
  A[Manifest List A]
  B[Manifest B]
  C[Manifest C]
  D[Layer D]
  E[Layer E]
  F[Layer F]
  G[Artifact Manifest G]
  H[Layer H]
  A --> B
  A --> C
  B --> D
  B --> E
  C --> E
  C --> F
  G -- subject --> C
  G --> H

Deleting Manifest C should also delete G, H, F.

@Wwwsylvia What do you think?

Do we need to delete the referrer? According to the distribution spec, the referrer can exist without the existence of the referenced manifest.

Wwwsylvia commented 10 months ago

I just noticed that this PR does not consider the GC on referrers. For example, consider the following graph:

graph TD;
  A[Manifest List A]
  B[Manifest B]
  C[Manifest C]
  D[Layer D]
  E[Layer E]
  F[Layer F]
  G[Artifact Manifest G]
  H[Layer H]
  A --> B
  A --> C
  B --> D
  B --> E
  C --> E
  C --> F
  G -- subject --> C
  G --> H
  graph TD;

A[Manifest List A] B[Manifest B] C[Manifest C] D[Layer D] E[Layer E] F[Layer F] G[Artifact Manifest G] H[Layer H] A --> B A --> C B --> D B --> E C --> E C --> F G -- subject --> C G --> H

Deleting Manifest C should also delete G, H, F. @Wwwsylvia What do you think?

Do we need to delete the referrer? According to the distribution spec, the referrer can exist without the existence of the referenced manifest.

Referrers can exist without its subject. But when a user is deleting a manifest, they probably want to delete the referrers as well. I think we can have an option to control whether to delete referrers or not.

wangxiaoxuan273 commented 9 months ago

Hi!

I took another look at your contribution. While I do not have any comments on the code, as it perfectly does its jobs, I am wondering if I am not missing something.

Indeed, in my use case, I want to delete the whole image, but testing with your code I still have some dangling blobs:

$ sudo ls /var/lib/ig/oci-store/blobs/sha256                   francis/rmi-upstream *%
225a8e697e260684d9451ccf8a622cd5548b0ae5fac6e13c416cd1d85247b7b7  b4be31f07444fb9be6dee8cd1349aab3c780d333b63406be10c5c14c1b39dd37
662ca58536d5a03864781eb42870f6712927981383f1dab1449d9adf80067313  c06a86dad7e582686b53bec9c729b92fc56c838b59cc11d0105c2c20f3158dcd
b0eff7f0ec2a27b8d71fe904cf373fa2002c6d669db54c25a9542cf2cb5f46d1  f61eef5ff800b2e4366c6d2e7311e95ce8dc3cff5c020a3a810429656fd93efb
$ sudo -E ./ig image remove execs                              francis/rmi-upstream *%
INFO[0000] Experimental features enabled                
Successfully removed execs
# Expected ls output is empty.
$ sudo ls /var/lib/ig/oci-store/blobs/sha256                   francis/rmi-upstream *%
225a8e697e260684d9451ccf8a622cd5548b0ae5fac6e13c416cd1d85247b7b7  c06a86dad7e582686b53bec9c729b92fc56c838b59cc11d0105c2c20f3158dcd
662ca58536d5a03864781eb42870f6712927981383f1dab1449d9adf80067313  f61eef5ff800b2e4366c6d2e7311e95ce8dc3cff5c020a3a810429656fd93efb
b4be31f07444fb9be6dee8cd1349aab3c780d333b63406be10c5c14c1b39dd37

I understand this difference as your code use Referrers() and then delete all the referrers to the descriptor being deleted, while the one I used previously deleted the successors. Taking a deeper look at Referrers(), this function acts on predecessors. But, if B is a successor of A, then A is a predecessor of B? I suspect I am missing OCI-related knowledge, but shouldn't a successor also be considered as a referrer?

Best regards.

Hi @eiffel-fl,

Thanks for the message. Referrer is a concept independent from successor. Referrer is related to the subject field of a manifest. To be precise, consider the following manifest A and manifest B: Manifest A { layers: [LayerA], subject: nil} Manifest B { layers: [LayerB], subject: Manifest A}

We say that Manifest B is a referrer of Manifest A. A referrer is considered a predecessor. The referenced manifest is considered a successor (child) of its referrer. In this case, Manifest A is a successor of Manifest B, just like Layer B is a successor of Manifest B.

But, if B is a successor of A, then A is a predecessor of B?

This is correct. In the above case, Layer B and Manifest A are successors of Manifest B, and Manifest B is a predecessor of Layer B and Manifest A.

shizhMSFT commented 9 months ago

I suspect I am missing OCI-related knowledge, but shouldn't a successor also be considered as a referrer?

@eiffel-fl The concept of referrers is from the distribution-spec while the concepts of successors and predecessors are from the graph thoery (precisely, directed acyclic graph or DAG). In other words, referrers of a certain node are predecessors, of which subject is that node.

eiffel-fl commented 9 months ago

Hi!

/ SNIP /

Hi @eiffel-fl,

Thanks for the message. Referrer is a concept independent from successor. Referrer is related to the subject field of a manifest. To be precise, consider the following manifest A and manifest B: Manifest A { layers: [LayerA], subject: nil} Manifest B { layers: [LayerB], subject: Manifest A}

First, thank you for the merge of this cool work :tada:! I think I understand, but if we remove Manifest B, manifest A may be unreachable? So, we should also take a look at successors (this is more to get a a better understanding and if the problem I shared is actually a real one or only mine).

I suspect I am missing OCI-related knowledge, but shouldn't a successor also be considered as a referrer?

@eiffel-fl The concept of referrers is from the distribution-spec while the concepts of successors and predecessors are from the graph thoery (precisely, directed acyclic graph or DAG). In other words, referrers of a certain node are predecessors, of which subject is that node.

It makes sense, hence the fact I am more familiar with predecessors and successors words rather than OCI one. THank you for shedding some light here.

Best regards and have an happy end of the year!

Wwwsylvia commented 9 months ago

I think I understand, but if we remove Manifest B, manifest A may be unreachable? So, we should also take a look at successors (this is more to get a a better understanding and if the problem I shared is actually a real one or only mine).

Hi @eiffel-fl , Happy New Year! 🎆

In this particular case,

So we shouldn't delete A when we delete B. However, we should delete B when we delete A. And this is what we do in PR #656 .

eiffel-fl commented 9 months ago

Hi!

Hi @eiffel-fl , Happy New Year! 🎆

In this particular case,

* `A` is a successor (subject) of `B`. For example, it can be an image manifest

* `B` is a referrer and a predecessor of `A`. For example, it can be the manifest of a signature of the image

* `A` can live without its referrer `B`, while `B` does not make sense without its subject `A`

So we shouldn't delete A when we delete B. However, we should delete B when we delete A. And this is what we do in PR #656 .

OK, thank you for the clarification! I will definitely take a look at #656 as I guess a call to GC() would perfectly suit my use case :D!

Best regards!