Closed gvyshnya closed 4 years ago
Hi @gvyshnya !
Thank you for your feedback! Anaconda actually was our first guess when we were developing installers for dvc(you can actually see traces of it in git log), but considering that dvc is currently more of a standalone utility, we actually opted in favor of pyinstaller to create a standalone binary for dvc and distribute it in usual packages(rpm,dev,exe), and pip to distribute it as a python package. That being said, we actually were thinking of creating anaconda/miniconda package in the future, when dvc will be more fit to be used as a library. We can now see that there is a clear demand for it and will try to deliver it in the near future.
looking forward to conda support!
Fixed https://github.com/dataversioncontrol/dvc/commit/79d710010471293a1184d1e66e7619c9bcc00ea0 the issue with download_url/url fields in our package info that didn't allow me to use conda skeleton pypi dvc
on 0.9.5. This fix will be released in 0.9.6 and I'll be sure to get back to creating conda package right after 0.9.6 is published on pypi.
Creating a conda package for dvc requires creating packages for all dependencies, as meta.yaml doesn't support pip dependencies for conda packages, only for environments. Thus making creating conda package for dvc time-consuming and tedious. If anyone from the community feels like working on it, please feel free to do so. For now, considering that we provide (among others) a pip package, which can be specified in conda env as a dependency, I don't see a real need in creating conda package right now and might revisit this issue in releases after 0.9.7.
Closing as stale. Please feel free to reopen if you feel like working on this.
Conda seems to have better support for creating identical and consistent environments on different platforms. For example, my development env is OSX (my laptop) but production is Ubuntu linux. I need to make sure that there are no differences in the packages installed on the two environments and that I am able to easily spin up a new machine with the same packages...
I agree with @yfarjoun. There are a few reasons why it would be really nice to have a recipe for dvc in one of the main conda channels:
pip
into an environment created by conda
that's both significantly less convenient (and more awkward to automate) and makes it much harder to generate reproducible environments.pip dvc[s3]
in a bare environment installs 38 packages, that's quite challenging.@yfarjoun @tfenne Thank you guys for all the feedback! We really appreciate it! Reopening this issue :slightly_smiling_face:
Guys, btw, could you elaborate on why is using
dependencies:
- pip:
- dvc==0.32.1
in your conda env not reproducible?
Thanks @efiop. This is essentially the strategy I'm using, but it's a bit more complicated than that. What that section actually ends up looking like is more like this:
- pip:
- appdirs==1.4.3
- asciimatics==1.10.0
- boto3==1.7.4
- botocore==1.10.84
- chardet==3.0.4
- colorama==0.4.1
- configobj==5.0.6
- configparser==3.7.3
- contextlib2==0.5.5
- decorator==4.4.0
- distro==1.4.0
- docutils==0.14
- dvc==0.32.1
- future==0.17.1
- gitdb2==2.0.5
- gitpython==2.1.11
- grandalf==0.6
- idna==2.8
- jmespath==0.9.4
- jsonpath-rw==1.4.0
- msgpack==0.6.0
- nanotime==0.5.2
- networkx==2.2
- ply==3.11
- pyasn1==0.4.5
- pyfiglet==0.8.post1
- requests==2.21.0
- s3transfer==0.1.13
- schema==0.7.0
- smmap2==2.0.5
- urllib3==1.24.1
- wcwidth==0.1.7
- zc.lockfile==1.4
... because without pinning the versions of all the dependencies, it's hard to guarantee reproducibility. Currently this is working because where dvc
requires a package that is previously installed by conda (in my env) the version that's installed satisfies the requirement. But if it required an earlier or later version that would start to be difficult to manage.
@efiop just curious, is anyone actively working on this issue? If not, it seems like something I wouldn't mind working on over the next week.
@J0 That would be amazing! :slightly_smiling_face: No, no one is working on it right now. Thank you so much for looking into this!
FYI, the outstanding DVC dependencies that do not have a conda build are:
For DVC to provide a conda build, I believe the above packages will also need a conda build. See contributing packages guidelines on conda-forge. The process for porting a PyPi package to conda-forge is becoming increasingly streamlined but still not a trivial task.
I would like to see DVC on conda but currently do not have the time to assist on this issue.
Started to work on this. A basic meta.yaml
for dvc is here - https://github.com/ei-grad/staged-recipes/blob/dvc/recipes/dvc/meta.yaml.
About dependencies:
@ei-grad: It is a bit unclear, if I want to add a package with dependencies which are not already on conda-forge, should I put this dependencies in the same pull-request with the package I want to add? Or should it be a separate PR for each dependency? @chrisburr: Both will work but you should consider: If the recipes are complex a separate PR will be easier to review If you do it in one PR the first feedstock build will fail due to missing dependencies so you'll have to restart it ~an hour later Multiple PRs can take longer to get reviewed
I guess it is better to put them in the same PR with the DVC.
Btw, @brbarkley could you please share how did you get the list of outstanding dependencies?
@ei-grad I manually went through DVC’s dependency list and searched for them on conda-forge.
looks like there's already a version on conda cloud: https://anaconda.org/derickl/dvc :eyes:
looks like there's already a version on conda cloud: https://anaconda.org/derickl/dvc 👀
that guy seems to have packaged everything that is needed. Including grandalf https://anaconda.org/derickl/grandalf .
@efiop , those are outside conda-forge (don't know if this is like the official distribution or something)
@J0 That would be amazing! 🙂 No, no one is working on it right now. Thank you so much for looking into this!
Any update on it? @J0
Hi @derickl ! We've found your conda package for dvc and we were wondering if you would be willing to contribute your scripts to create an official dvc repo, that we could help maintaining and keeping up-to-date?
Thanks to all of you working on this. It world be awesome to have a Conda dvc package, as I mainly use conda as package manager. However, I prefer if it is possible to have the dvc package in the main or conda-force channel.
Help is needed on this, right? Whom can I discuss that with?
I'm happy to talk as a user.
@GildedHonour we actually have a guy who is looking into this right now. Are you interested in helping us for this specific task or just want to be involved and help DVC in general? Would be happy to discuss and find more stuff where we need more hands :)
@shcheklein in general too. Yes, let's discuss.
@GildedHonour Alex, can you find me and/or Ruslan on dvc.org/chat
(ivan and ruslan)? would be happy to chat.
@shcheklein just done
https://github.com/conda-forge/staged-recipes/pull/8963 was merged. Dvc should be available throug conda-forge now https://github.com/conda-forge/dvc-feedstock , unless I'm missing something. Big thanks to @MaxRis :tada:
@efiop unfortunately, dvc package will be uploaded to conda-forge channel once we will have 1st successful ci build in feedstock repo's master https://github.com/conda-forge/dvc-feedstock/commits/master ( so far the build was failed because of others not yet uploaded dependencies ).
Another important thing is that only Python 2.7 and 3.6 builds are enabled for dvc feedstock. To enable Python 3.7 builds it will be needed to remove restriction from there https://github.com/conda-forge/dvc-feedstock/blob/master/recipe/meta.yaml#L14 , but before we can do that it's required to bring Python 3.7 based builds for all DVC's dependencies.
@MaxRis Thanks for the clarification! Let's keep this open for now then.
DVC 0.53.2 for Python 2.7 and 3.6 is available through conda-forge now!
conda install -c conda-forge dvc
Python 3.7 build of dvc is available now!
Odd thing is that on Windows 10 I'm receiving following error when trying to run installed dvc from conda-forge
:
Fatal error in launcher: Unable to create process using '"c:\bld\dvc_1564563047081\_h_env\python.exe" "C:\Users\max\Miniconda3\Scripts\dvc.exe" '
Will try to investigate this more.
Finally, dvc 0.54.1 build 1 with all extra deps is available in conda-forge
@MaxRis awesome stuff! Thanks. The only thing is the doc on how do we support/update it in the future before we close this ticket (finally).
k, thanks, @MaxRis, we have all the docs ready now - https://github.com/iterative/dvc/wiki/Maintenance-of-Anaconda-package-in-conda-forge-channel
@efiop please, take a look and let's update our release check list to include a step to upgrade requirements is necessary.
I think we are ready to close this issue at last 🎉
@shcheklein Added a quick one https://github.com/iterative/dvc/wiki/Release-checklist
thanks @efiop 🙏 :)
Anaconda (https://www.continuum.io/what-is-anaconda) is the leading Python distribution for data science today. It has its internal package manager - conda (https://conda.io/docs/index.html), which is a rival to a well-known pip.
Since Anaconda as well as its python-only lightweight version of Miniconda (https://conda.io/miniconda.html) are getting more and more tracking within Data Science community these days, porting DVC installer to conda may become a good step to streamline DVC usage across industrial analytical circles.