Open iddoavn opened 1 year ago
Here are a few options that might fulfill the requirement:
aws cp s3://repo/branch-source/table s3://repo/branch-dest/table
. This isn't a zero-clone copy but it's not downloading and uploading the data. It's performing an object-store side copying, i.e. the copied objects never go thru the client or lakeFS itself. You can use aws mv ...
to get the MV functionality, which starts with a similar copy followed by a deletion. The option to zero-copy uncommitted objects thru lakeFS (i.e. without using a merge, commit, cherry-pick, etc.) was forfeited not long ago. The reasoning was to ensure a safe cleanup of the GC without risking data loss.
I think that makes a lot of sense for uncommitted data. But for committed data, it would be good to have a copy. Because a commit may include more changes than the ones you want to copy over.
Nevertheless, agree cherry pick is helpful in many cases, especially if you have a good commit hygiene.
@ozkatz please prioritize
This issue is now marked as stale after 90 days of inactivity, and will be closed soon. To keep it, mark it with the "no stale" label.
Closing this issue because it has been stale for 7 days with no activity.
Scenario: One member of the team changes a few tables on their own branch. Then, that member wants to expose one table and that one table only (that may not be committed) to a different team member working on a different branch.
It would be good if we could use a lakectl CP, to copy a data set from one branch to another - This, of course, should be a zero clone copy. Maybe even to a different repo?
Another use case can be a lakectl MV that basically renames a data set. This is also an added capability on top of an object store where if you want to achieve something like this you need to go through a long and potentially expensive exercise of downloading and uploading data.