oras-project / oras

OCI registry client - managing content like artifacts, images, packages
https://oras.land
Apache License 2.0
1.33k stars 160 forks source link

oras support for tempfile cleanups and partial artifacts cleanup #1400

Open nileshpatra opened 3 weeks ago

nileshpatra commented 3 weeks ago

What is the version of your ORAS CLI

1.2.0

What would you like to be added?

I could not find it anywhere written in the docs if oras pull cleans up the output directory or tempfiles in case oras pull is unsuccessful due to for instance poor network connection.

Why is this needed for ORAS?

If this feature is not already present, it'd be good for oras to cleanup things in an intermediate state in order to start downloading again, or at least it should give such an option to enable.

Are you willing to submit PRs to contribute to this feature?

nileshpatra commented 3 weeks ago

If I do not initialize ORAS_CACHE and try to pull in to an output dir and if I cancel the context, oras seems to now download partial thing and clean it up. So it seems oras seems to already manage partial downloads, is it?

I need to use it in a script and hence need to know if handling it is needed.

/cc: @qweeah

qweeah commented 2 weeks ago

@nileshpatra Thanks for the valuable feedback!

Firstly, the user experience of oras pull is like cp commands: if a copy process is aborted halfway, then copied files in the destination folder will not be cleaned.

Secondly, you can set up the ORAS_CACHE variable to use a folder as a temporary cache for storing the files in a content-addressable way. But by design, ORAS CLI is not responsible for cleaning that folder.

qweeah commented 2 weeks ago

it'd be good for oras to cleanup things in an intermediate state in order to start downloading again

IMHO, to speed up re-pull, ORAS doesn't need to cleanup things. The right thing to do is checking the existence of a to-be-pulled file and skip it if an identical copy is already there.

nileshpatra commented 2 weeks ago

On Mon, Jun 10, 2024 at 11:03:19PM -0700, Billy Zha wrote:

@nileshpatra Thanks for the valuable feedback!

Thanks for your response!

Firstly, the user experience of oras pull is like cp commands: if a copy process is aborted halfway, then copied files in the destination folder will not be cleaned.

If I understand correctly, oras pull will try to download the artifacts and manifests and if aborted, the downloaded stuff will not be cleaned, correct?

Secondly, you can set up the ORAS_CACHE variable to use a folder as a temporary cache for storing the files in a content-addressable way. But by design, ORAS CLI is not responsible for cleaning that folder.

Right. Does it "copy" things from any place to another place even if there's no cache setup?

nileshpatra commented 2 weeks ago

On Mon, Jun 10, 2024 at 11:16:12PM -0700, Billy Zha wrote:

it'd be good for oras to cleanup things in an intermediate state in order to start downloading again

IMHO, to speed up re-pull, ORAS doesn't need to cleanup things. The right thing to do is detect existent files and skip those.

Correct, but what if the downloaded files are partially downloaded, i.e. the entire artifact isn't present?

qweeah commented 2 weeks ago

Correct, but what if the downloaded files are partially downloaded, i.e. the entire artifact isn't present?

1) One artifact can contain more than one file 2) If a file is partially downloaded, the checksum won't match the layer digest and thus won't be recognized as existed file

qweeah commented 2 weeks ago

If I understand correctly, oras pull will try to download the artifacts and manifests and if aborted, the downloaded stuff will not be cleaned, correct?

Yes.

Right. Does it "copy" things from any place to another place even if there's no cache setup?

If there is no cache setup, there is no partial file created in the local file system.

qweeah commented 1 week ago

As is mentioned in https://github.com/oras-project/oras-go/issues/777, the performance of oras pull can be optimized if oras-go can skip copying existed file.