simonw / datasette

An open source multi-tool for exploring and publishing data
https://datasette.io
Apache License 2.0
9.59k stars 691 forks source link

--cp option for datasette publish and datasette package for shipping additional files and directories #675

Open aviflax opened 4 years ago

aviflax commented 4 years ago

I’m working on integrating Datasette into a documentation-oriented publishing workflow internally in my company, and in order to deploy the Docker image created by datasette package I need to add an additional file to the image — in my case, it’s a sort of a deployment directive. I’ve worked out a way to do this after the image has been created, but it’s convoluted and brittle.

So it’d be excellent if there was an additional option for this command, something like, like, --copy.

I’d envision it looking something like:

$ datasette package --copy /the/source/path:/the/target/path data.db

I’d be happy to help design, specify, implement, and test this feature, if you’d be interested.

Thanks for the fantastic tools!

simonw commented 4 years ago

Interesting feature suggestion.

My initial instinct was that this would be better handled using the layered nature of Docker - so build a Docker image with datasette package and then have a separate custom script which takes that image, copies in the extra data and outputs a new image.

But... datasette package is already meant to be more convenient than messing around with Docker by hand like this - so actually having a --copy option like you describe here feels like it's within scope of what datasette package is meant to do.

So yeah - if you're happy to design this I think it would be worth us adding.

Small design suggestion: allow --copy to be applied multiple times, so you can do something like this:

datasette package \
    --copy ~/project/templates /templates \
    --copy ~/project/README.md /README.md \
    data.db

Also since Click arguments can take multiple options I don't think you need to have the : in there - although if it better matches Docker's own UI it might be more consistent to have it.

aviflax commented 4 years ago

So yeah - if you're happy to design this I think it would be worth us adding.

Great! I’ll give it a go.

Small design suggestion: allow --copy to be applied multiple times…

Makes a ton of sense, will do.

Also since Click arguments can take multiple options I don't think you need to have the : in there - although if it better matches Docker's own UI it might be more consistent to have it.

Great point. I double checked the docs for docker cp and in that context the colon is used to delimit a container and a path, while spaces are used to separate the source and target.

The usage string is:

docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH|-
docker cp [OPTIONS] SRC_PATH|- CONTAINER:DEST_PATH

so in fact it’ll be more consistent to use a space to delimit the source and destination paths, like so:

$ datasette package --copy /the/source/path /the/target/path data.db

and I suppose the short-form version of the option should be cp like so:

$ datasette package -cp /the/source/path /the/target/path data.db
simonw commented 4 years ago

Design looks great to me.

I'm not keen on two letter short versions (-cp) - I'd rather either have a single character or no short form at all.

aviflax commented 4 years ago

Design looks great to me.

Excellent, thanks!

I'm not keen on two letter short versions (-cp) - I'd rather either have a single character or no short form at all.

Hmm, well, anyone running datasette package is probably at least somewhat familiar with UNIX CLIs… so how about --cp as a middle ground?

$ datasette package --cp /the/source/path /the/target/path data.db

I think I like it. Easy to remember!

simonw commented 4 years ago

Sure, --cp looks good to me.

simonw commented 3 years ago

It turns out I need this for a couple of projects:

I want this for datasette publish cloudrun, not just for datasette package.

simonw commented 3 years ago

Note that datasette publish cloudrun uses a working directory of /app - so users will need to copy their files into /app if that's where they need to live.

https://github.com/simonw/datasette/blob/17cbbb1f7f230b39650afac62dd16476626001b5/datasette/utils/__init__.py#L348-L357

simonw commented 3 years ago

But since we're already running COPY . /app anything that's made it into the temporary directory will get copied into /app.

But... I feel the usability of the command will be better if users can use absolute paths on the target side:

datasette publish cloudrun my.db --cp dogsheep-beta.yml /app
simonw commented 3 years ago

I can't just use COPY /path/to/blah.yml /app in the Dockerfile because it runs on the Google Cloud Build servers, not on the user's laptop - so I need to first copy the files they specify to that temporary directory that gets uploaded to the cloud, then rewrite the COPY lines in the Dockerfile to copy from there.

simonw commented 3 years ago

The other way this could work is passing a single argument - the file (or directory) to be copied in - and assuming it should always go in the /app root. Something like:

datasette publish cloudrun my.db --include src/ --include dogsheep-beta.yml

Which would add /app/src/... and /app/dogsheep-beta.yml.

simonw commented 3 years ago

I could make --include work if I also had a mechanism for running some shell commands inside the container at the end of the build - which users could then use to move files into the correct place.

datasette publish cloudrun my.db --include src/ --exec 'mv /app/src/config.yml /etc/conf/config.yml'
simonw commented 3 years ago

That --exec could help solve all sorts of other problems too, like needing to apt-get install extra packages or download files from somewhere using wget.