rivermont / spidy

The simple, easy to use command line web crawler.
GNU General Public License v3.0
334 stars 69 forks source link

Docker is unusable #72

Open Enelar opened 5 years ago

Enelar commented 5 years ago

Expected Behavior

Docker should simplify things not make them harder

Actual Behavior

Docker is a strugle, you have to build image several times before it works: It ignores configs that in /data directory, and use only those defaults which was in repo. It creates results as root, etc etc.

What I've tried so far:

Best workaround is to use -v $PWD:/src/app/spidy/config/ but it still ugly

Enelar commented 5 years ago
vertoforce commented 4 years ago

I had this same issue, it was painful to figure out why it wasn't reading my input files

rivermont commented 3 years ago

@pbnj I know it's been a few years; is this something you still want to maintain or should I look at rewriting the Docker?

pbnj commented 3 years ago

@rivermont - I haven't used spidy in a while, but I can look at it later today and submit PR with necessary changes.

Also, since then, GitHub has come out with their own package/container registry. I would recommend publishing the built images so that users of this project can just docker run ... ghcr.io/rivermont/spidy:latest ... without having to build it first, but that would require implementing some GitHub Action / CI workflows to build and push the image. I leave this as a separate decision for you to make since this is just a recommendation.

teaforchris commented 1 year ago

@pbnj @rivermont did anything ever come of this?

I'm keen to use spidy as a self contained image, either in GH's container registry as Peter suggested, or dockerhub could also work but won't be as tightly integrated.

We use Docker heavily for local development and I wanted to add this as a simple tool for devs to crawl a project as it was being built; to check for 404s on localhost. I could also use this as part of our CI pipeline to ensure there are no broken links or huge images on our test/staging environment before something is deployed.

Happy to help out with a PR if we feel this is generally useful - otherwise I'll fork and set something up for my own use.

pbnj commented 11 months ago

@Enelar

It ignores configs that in /data directory, and use only those defaults which was in repo.

Fixed in #90.

It creates results as root, etc etc.

I could not replicate this. On my system (macOS 14.0 + Docker version 24.0.6, build ed223bc820), the files are written as my host machine user, not as root. Please, provide your OS info & docker version.

Best workaround is to use -v $PWD:/src/app/spidy/config/ but it still ugly

If you are trying to use custom configs, you can mount them from wherever they are on your host machine into any path in the docker container.

For example, if your custom configs are located in $HOME/.spidy/test.cfg, you can mount it like this: docker run --rm -it -v /tmp:/data -v $HOME/.spidy:/config spidy. When prompted, provide the config path (from the container's perspective): /config/test.cfg.

pbnj commented 11 months ago

@teaforchris - if you're asking about pre-built spidy images, this is outside of my control as I am not the repo owner.

It would be up to @rivermont to implement this per GitHub's instructions:

@rivermont - if you have any questions or run into any issues with this, I am happy to help.