python-wheel-build / fromager

Build your own wheels
https://pypi.org/project/fromager/
Apache License 2.0
3 stars 8 forks source link

Allow renaming the unpacked tarballs #184

Closed shubhbapna closed 1 month ago

shubhbapna commented 1 month ago

Just like we allow renaming of the downloaded tarballs, we need to allow renaming the unpacked tarball as well

shubhbapna commented 1 month ago

If the tarball is being renamed to something like name-{version}.tar.gz then it seems like the directory in which the tarball should be unpacked be name-{version}: https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L232

@dhellmann is this not the case? do we need specific renaming of the unpacked tarball?

dhellmann commented 1 month ago

If the tarball is being renamed to something like name-{version}.tar.gz then it seems like the directory in which the tarball should be unpacked be name-{version}: https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L232

@dhellmann is this not the case? do we need specific renaming of the unpacked tarball?

We probably don't need it to be a different value.

I haven't been able to give this my full attention this week. I'd like for us to find a way for the settings-driven download and naming stuff to make some of the plugin methods irrelevant. Maybe you could work through some scenarios based on the packages we have and give some examples of how someone would write the config (assuming we have whatever renaming we need) and then not have to provide functions to give the expected names for directory, tarball, etc.?

shubhbapna commented 1 month ago

I checked whether a tarball named stevedore-5.2.0.tar.gz got unpacked to stevedore-5.2.0. and I do see a stevedore-5.2.0 directory in the work-dir. Plus looking at the code we are passing the root dir for the directory in which we are unpacking: https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L261 https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L263 https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L267

And the root dir is constructed by removing the extensions: https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L245

dhellmann commented 1 month ago

I checked whether a tarball named stevedore-5.2.0.tar.gz got unpacked to stevedore-5.2.0. and I do see a stevedore-5.2.0 directory in the work-dir. Plus looking at the code we are passing the root dir for the directory in which we are unpacking: https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L261 https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L263 https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L267

And the root dir is constructed by removing the extensions: https://github.com/python-wheel-build/fromager/blob/main/src/fromager/sources.py#L245

stevedore probably isn't a sufficient test case, because it's sdist is already structured with the right naming convention.

Maybe try setting up a test job that just runs the step commands to download and prepare the source using torch's github releases? Don't build it, that's too complicated. Downloading and unpacking the source should be enough of a test.

shubhbapna commented 1 month ago

This is the debug statement I got :

DEBUG:fromager.sources:265: unpacking /path/on/local/machine/sdists-repo/downloads/torch-2.3.0.tar.gz to /path/on/local/machine/work-dir/torch-2.3.0

The path to sources on my local machine is then /path/on/local/machine//work-dir/torch-2.3.0/pytorch-v2.3.0/ Is the torch-2.3.0/pytorch-v2.3.0 part the issue? I think this might just be how the tarball is being created i.e there is a directory called pytorch-v2.3.0 in that tarball

shubhbapna commented 1 month ago

@dhellmann Ah I think I finally understood what we want.

If fromager downloaded a tarball named pytorch-2.3.1.tar.gz and renamed it to torch-2.3.1.tar.gz then upon unpacking it, it will create torch-2.3.1/pytorch-2.3.1 and the path returned by unpack_source is torch-2.3.1/pytorch-2.3.1. This is what our plugins have been renaming to essentially have torch-2.3.1/torch-2.3.1 as the source directory