slovensko-digital / harvester.ecosystem

App for pushing data to ekosystem.slovensko.digital
https://ekosystem.slovensko.digital
European Union Public License 1.1
20 stars 5 forks source link

Dockerize Harvester for development purposes #20

Closed mkrcah closed 7 years ago

mkrcah commented 7 years ago

This PR is WIP and is not complete.

The goal is to start the Harverster and all it's dependencies with one command: docker-compose up. This command would start up Postgres & Redis, initalize the db and start the server.

This approach has the following advantages:

To complete the PR, there are however a few open questions:

Web/REST API?

Required system dependencies

How to run?

Gems location What is a common location for Ruby apps to install gems locally? When using Docker, it is a common practice to install the dependencies locally in order to increase the container startup time. Currently, I set the target directory to /.bundle.

jsuchal commented 7 years ago

Harvester app is just a backend worker / scheduler. the web app proces there... hm not sure why this is there. We are using dokku for deployment, maybe it needs some web worker, will check that.

We had an internal discussion about using only plain old ruby for this, but I am not a fan for a simple reason: From my experience, we need to enforce strict rules on project. to simplify onboarding, its just best to pick a de-facto standard for conventions (rails). yes, rails has its quirks, but you have gems, jobs, testing, autoloading & all nice tools every ruby/rails developer knows in its place. I've seen too many projects that were not using rails and every time the onboading a development was a lot harder.

if someone really wants to start pushing data to ecosystem/datahub from non-standard env, its doable. we just create you a schema and a user that can write to our master database on that schema. we only need to agree on table naming standard. you can use any language you want, but we can't guarantee any maintenance outside of our standard ruby/rails stack.

depts: redis/postgres is right. i would stick to compose.yml only for external dependencies excluding ruby.

if anyone wants to do ruby/rails development i asume he is familiar with rbenv/rvm and has ruby/bundler installed. We use .ruby-version a Gemfile so the dependency versions are locked anyway. I don't see a point dockerizing this part, unless someone wants to run this without any ruby/rails knowledge. Not sure why anyone would want to do that on this particular project.

mkrcah commented 7 years ago

Thanks for the update, I have updated the code accordingly.

I'm still struggling to get the job running. I submit a new job to the queue with rake itms:all:sync and see the job being queued in the log/development.log. However, there is no worker picking the job. How do I start and monitor the worker?

script/rails: I understand. I saw similar challenge in Python ETL apps pushing from non-ruby: I will try to get my hands dirty with Ruby first, if that's ok :) ruby-in-docker: one advantage is for polyglot engineers who work on different stacks on one machine. there is no need to "pollute" the host machine with different installations of Python,Ruby, etc, all is encapsulated in Docker, incl. the interpreter. Recent Intellij IDEs can even hook into a remote interpreter running in Docker. I'll stick with on-host Ruby for now for Harvester.

jsuchal commented 7 years ago

If you want to start the app/harvester you need to start the worker proces. Look at https://github.com/slovensko-digital/harvester.ecosystem/blob/master/Procfile and https://ddollar.github.io/foreman/ this is a good way to run it. just foreman start and you are done.

mkrcah commented 7 years ago

Great, I got the Harvester fully up and running :) I have updated the README according to your remarks.

jsuchal commented 7 years ago

Thanks!