singer-io / getting-started

This repository is a getting started guide to Singer.
https://singer.io
1.25k stars 148 forks source link

Conflicting requirements #19

Open anthonyp opened 7 years ago

anthonyp commented 7 years ago

It's totally possible I'm just missing something, but I believe different taps and targets may have conflicting requirements. For example, while using the HubSpot tap, I received the errors:

    - pkg_resources.DistributionNotFound: The 'requests==2.12.4' distribution was not found and is required by tap-hubspot
    - pkg_resources.DistributionNotFound: The 'singer-python==1.2.0' distribution was not found and is required by tap-hubspot

yet other packages targets/taps seemed to require varying versions of those same packages. I ultimately got around this by setting up each target and tap in its own Docker image and piping between containers. (This was also a good way to get around python 2 vs 3 issues and pip 2 vs 3 issues on my Mac, but that's a whole other topic)

mdelaurentis commented 7 years ago

Hi @anthonyp . You're correct, different Taps and Targets may have different requirements. We've tried to avoid making a framework that taps and targets must fit into, and instead allow taps and targets to be independent programs with possibly different requirements. In fact we expect that some Taps and Targets will be written in languages other than Python, and we definitely don't want to impose a common set of requirements across different languages.

The way we get around that at Stitch is by installing each Tap and Target in its own virtualenv. If you do that, you'll get an executable for each Tap and Target that can be set up its own environment. It works out pretty well. For example, here's how I would install and run tap-closeio and target-stitch together:

# Install tap
mkvirtualenv -p /usr/bin/python3 tap-closeio
cd tap-closeio
python setup.py install
cd ..

# Install target
mkvirtualenv -p /usr/bin/python3 target-stitch
cd target-stitch
python setup.py install 

# Run them
~/.virtualenvs/tap-closeio/bin/tap-closeio -c tap-config.json -s state.json | ~/.virtualenvs/target-stitch/bin/target-stitch

Would you mind trying out that approach and letting us know if that works for you?

Thanks for pointing out that our documentation doesn't address this. We'll update the docs.

anthonyp commented 7 years ago

Thanks for the explanation @mdelaurentis . Yes, this works. That said, to your point about taps and targets eventually being written around languages other than Python, it might be a good idea to standardize early around an environment isolation technology that's not specific to Python.

I did manage to get this flow working quite well in Docker. I'm wondering if it wouldn't make sense to consider making a Dockerfile - and corresponding image on the DockerHub registry - for each tap and target part of the spec. This would allow each tap and target to be used as an independent "binary" out of the box, but also for its environment to be easily defined and evolved when working from source.

If this path has any appeal, please let me know and I can contribute a few starter files and examples to the project.

mdelaurentis commented 7 years ago

@anthonyp I've updated the README to suggest using virtualenv for installing Taps and Targets in separate environments. See https://github.com/singer-io/getting-started/pull/21.

If you want to contribute an example Dockerfile to one of the Taps and write up a brief doc describing your workflow, that would be awesome. We're totally open to the idea of adding Dockerfiles to projects if people find the Docker-based workflow to be useful. For example, the Outbrain tap has a Dockerfile (https://github.com/singer-io/tap-outbrain/blob/master/Dockerfile). However, I don't think we're going to standardize on something like Docker at this time. We'd rather keep the spec focused on the command-line inputs and outputs, and leave the development and deployment environment out of the standard for now. But if we have a few people using similar Docker-based workflows, it may be worth adding that to the best-practices guide.

tlrobinson commented 7 years ago

I like the idea of supplying Dockerfiles with taps/targets, not in SPEC but in BEST_PRACTICES. It doesn't need to dictate how it's deployed but can serve as sort of "executable documentation" for the requirements.

However I'm not really sure the best way to connect taps and targets running in Docker. I think piping docker run commands to each other has issues like all of stdout being sent to logs. Perhaps using named pipes in a shared volume?

mdelaurentis commented 7 years ago

@tlrobinson yeah, I haven't tried to pipe docker run commands to each other, but I would not be surprised if there are issues with that. Using named pipes in a shared volume is an interesting idea. We're currently focusing on getting as many taps as possible implemented, but we'll probably turn our attention towards improving tooling once we get several more taps completed.

anthonyp commented 7 years ago

@mdelaurentis @tlrobinson For what it's worth, I was able to pretty easily pipe between docker runs using the --interactive option:

docker run --interactive my-tap-hubspot-image tap-hubspot | docker run --interactive my-target-gsheet-image target-gsheet

Of course, I also passed in some other options including mapped-in config files. But conceptually, this worked fine. I believe the --attach option might also work for this use case.

Meanwhile, the --log-driver option can be used to control if and/or how logs are utilized. I think this is actually quite handy; some folks might want to log this output for debugging, some might not. It's all easily controllable.

Reference: https://docs.docker.com/engine/reference/run

kcolton commented 7 years ago

@mdelaurentis We thankfully have multiple good options for resolving different layers of dependency problems (virtualenvs for python, docker for different host dependencies).

Noticed a slight inconsistency in https://github.com/singer-io/getting-started where the Google Sheets example jumps from calls into separate VEs in Step 4 to what at least looks like calls where the tap and target are both in the same ve.

Developing a Python Tap also seems to imply you are working within a single ve.

RFC: An example "project" structure, where a few different taps and target are housed, in whatever method of isolation they require, with simple shell scripts or symbolic links so that all execution examples can be uniform. Something like:

@anthonyp Going 100% docker is an interesting, but may be too much to learn for a newcomer making barrier to entry quite high.

But, there is definitely something interesting to having all taps and targets housed in a Docker containers defined by a docker-compose file to handle settings, copying necessary config & credential files.

I've gotten the getting started + a custom python tap up and running both with isolated, and with shared VE, maybe will give it a go with Docker and see what comes out the other side.

A good goal would be keeping the barrier to entry as low as using a shared VE, but easily making it so you could isolate all the way to individual docker containers.

Subhraj07 commented 2 years ago

Is it possible to create separate docker containers for each tap and target and then use them by calling separate containers rather then separate virtual environments?