benoitbryon commented 13 years ago

Hi,

First of all, thanks for this software! It looks really great!

As a python developer, I felt confused when I tried fileconveyor:

I couldn't figure out how to install it quickly with Python tools like pip and buildout
I saw some sys.path hacks in arbitrator.py

So, here is a first pull request with a packaging proposal...

Changes description

renamed code to fileconveyor. Yes it's a big change, maybe backward incompatible, but it is required if we want to be able to write something like "from fileconveyor.config import Config": directory name is the module name. Another option would have been to move from "code" to "src/fileconveyor", but the result is quite the same.
added a init.py in fileconveyor code directory. It makes it a Python module.
added a setup.py: it makes it an egg
started to remove "sys.path" hacks. Now fileconveyor's python scripts can be imported with the fileconveyor namespace.
in config.xml, transporter names and processor names are full path to Python modules. It allows one to specify a third-party transporter or processor (not bundled with fileconveyor). Maybe it opens the door to reusing django-storages rather than rewriting it.
moved some files (readme, license, ...) to project's root. It is a common practice in python packages. This is not a mandatory change.
I set a "0.1-dev" version number and "beta" status... but I am not convinced at all! Feel free to change it in setup.py.
Expected results

With these changes, I could:

install fileconveyor with pip:

pip install -e git+https://github.com/benoitbryon/fileconveyor@packaging-egg#egg=fileconveyor

install fileconveyor with buildout:

[buildout]
find-links = https://github.com/benoitbryon/fileconveyor/tarball/packaging-egg#egg=fileconveyor-0.1-dev
parts =
    python
eggs =
    cssutils
    pyinotify
    paramiko
    fileconveyor
[python]
recipe = zc.recipe.egg
interpreter = python
eggs = ${buildout:eggs}

With these changes, you should be able to release your work on Pypi! A good start to promote your project in the Python community. Maybe you will get additional contributors ;)

Tests

I successfully used the modified version of fileconveyor on a Linux system, but I did not tried every processor or transporter. So I cannot say "it works!" but only "it worked for me".

Have you a test procedure that I should follow the next time I want to do a pull request?

wimleers commented 13 years ago

First of all: WOW. THIS IS AWESOME!

(And sorry for shouting — I'm just very excited!)

wimleers commented 13 years ago

Okay, some background info.

I knew what I wanted to write for my bachelor thesis. I knew what it had to be capable of. But I didn't want to write it in a particular language. Some of my classmates were extremely excited about Python back then (and still are, AFAIK) — they claimed there was "a Python module for everything". So I chose Python, despite never having used Python before (hence some hacks and unpythonesque things, the lack of proper Python packaging, etc.).

Only to be disappointed to find out that for all the crucial things I needed to do, there were no Python modules. That's why I wrote FSMonitor. But it was actually extremely hard to find some Python module that provided file system storage abstraction, protocol abstraction, or whatever you call it. Eventually, I stumbled upon django-storages and started using that. I contributed several patches to it, making it more stable.

It's always been my intention to contribute to django-storages from the File Conveyor project (i.e. upstream), but also to get new functionality from the developers of django-storage (downstream). It'd be a win-win situation. But, unfortunately, virtually nobody used File Conveyor for quite some time. It seems it's gaining some traction* at last! (See the exciting announcement at http://fileconveyor.org/ — more details to come later!)

Your changes.

Code renamed to fileconveyor. Awesome. I've been wanting to do this anyway :)
init.py + setup.py + removal of sys.path hacks: superb!
config.xml transporter + process names full Python module paths. I like the possibly opened door you're mentioning. However, it makes the config file so much more verbose. Can't we allow for the behavior you're suggesting and at the same time keep it the way it is? I.e., if no full Python module path is specified, assume it is a transporter/processor that ships with File Conveyor, and default to File Conveyor's paths?
0.1-dev + beta. Strange. Coming from Drupal, dev means a development snapshot and beta means a tagged release. But you probably meant that File Conveyor is currently in "beta" quality. Update: oh, it reflects the code quality/status on pypy.org. Makes sense :)

Expected results: installing with pip and possibly getting additional contributors: HURRAY!

Tests procedure: not really. I do have unit tests, but that's it. These unit tests are currently written per Python module (since I wrote this module per module, until each module was suitably tested and pretty much bug-free) and don't come with a project-wide testrunner yet, that runs all individual unit tests. That's another thing that needs to be added :)

Conclusion:

You rock! Thank you for your contribution, sir.
You're clearly a Pythonista and work with it far more regularly. Which makes this contribution all the more valuable!
Would you like to get commit rights? :)
I definitely want to commit this, but could you first fix my only objection (support full Python module paths, but also support just module names if they ship with File Conveyor).
Documentation would need to be updated, but I'll take that on me. If you could just fix point 4, that'd be great :)

wimleers commented 13 years ago

An issue has been created for the test runner thingie: #83.

benoitbryon commented 13 years ago

Would you like to get commit rights?

I suggest that I do additional pull requests before being granted commit rights. So that we discuss, then share some vision on the project. I feel that my own vision of the project is too restricted right now.

If you think it's better to give me commit rights now, do it. I won't be using them on master at first (I will work on branches)... Maybe later with your approval.

benoitbryon commented 13 years ago

could you first fix my only objection (support full Python module paths, but also support just module names if they ship with File Conveyor)

Agreed. I guess I can implement it with some "try-except ImportError" block.

benoitbryon commented 13 years ago

The commits above are about your remark: support full Python module paths, but also support just module names if they ship with File Conveyor:

first try to import module
if it doesn't work, try to import it with an adequate fileconveyor prefix

Right now, the error message for external processors or transporters is not perfect. The scenario is:

let's consider a typo in configuration file. As an example: "mypackage.typo.module" instead of "mypackage.transporters.module".
try to import external module (full module path). Fails because of the typo. No error message.
try to import internal module (prefixed module path). Error message tells "Cannot load transporter fileconveyor.transporters.transporter_mypackage.typo.module" (with prefix)
something like "Cannot load transporter mypackage.typo.module. Also tried fileconveyor.transporters.transporter_mypackage.typo.module" may be more meaningful.

wimleers commented 13 years ago

I'm leaving on a vacation tomorrow morning, so it may be a while until I get back to you on this (j'ai vu que tu es un Français — nous allons en vacances à Nice!) ASAP, I promise. Considering the preparations I still have to make and the arrangements for my housing in Palo Alto (I'm interning at Facebook), I really can't spend any of my time reviewing your changes right now.

Thanks for your patience and understanding!

benoitbryon commented 12 years ago

Warning: currently: catching import errors during processor and transporter loading can hide import errors in external libraries.

Synopsis:

let's say that django-storages is not installed
use the symlink_or_copy transporter
arbitrator.py tries to load symlink_or_copy transporter...
... which tries to load django-storages, and raises an ImportError!
the error log says that symlink_or_copy transporter couldn't be found. Wrong!

It may not be a blocker issue.

wimleers commented 12 years ago

I'm afraid I can't get it to work:

--( ~/Work/fc-benoit/fileconveyor (packaging-egg) )-- python arbitrator.py 
Traceback (most recent call last):
  File "arbitrator.py", line 23, in <module>
    from fileconveyor.settings import *
ImportError: No module named fileconveyor.settings

or

--( ~/Work/fc-benoit (packaging-egg) )-- python fileconveyor/arbitrator.py 
Traceback (most recent call last):
  File "fileconveyor/arbitrator.py", line 23, in <module>
    from fileconveyor.settings import *
ImportError: No module named fileconveyor.settings

Basically, it seems like you can't refer to a Python package itself by its name from within the package. I.e. the following code:

from fileconveyor.settings import *
from fileconveyor.config import *
from fileconveyor.persistent_queue import *
from fileconveyor.persistent_list import *
from fileconveyor.fsmonitor import *
from fileconveyor.filter import *
from fileconveyor.processors.processor import *
from fileconveyor.transporters.transporter import *
from fileconveyor.daemon_thread_runner import *

should be changed to

from settings import *
from config import *
from persistent_queue import *
from persistent_list import *
from fsmonitor import *
from filter import *
from processors.processor import *
from transporters.transporter import *
from daemon_thread_runner import *

A similar change is needed at the importing of Transporter classes:

`defaultprefix = 'fileconveyor.transporters.transporter'``

->

default_prefix = 'transporters.transporter_'

Maybe this is just my system, maybe something is wrong with my Python installation (which I doubt), or maybe I'm just doing something stupid (if I ever was something close to a Python expert, then that's definitely no longer the case).

Please tell me what a fool I'm being and where I'm making a super obvious, moronic mistake…

benoitbryon commented 12 years ago

It looks like "fileconveyor" module is not in your sys.path when you launch arbitrator.py. How did you install the fileconveyor package?

With buildout: did you use local bin/python instead of system python ?
With virtualenv+pip: did you activate the virtualenv ? (note: you could use virtualenv's local bin/python too)

wimleers commented 12 years ago

I didn't install the package. I simply worked with my existing FileConveyor git code. Can't we make sure that continues to work as well? If I want to work on code, I want to check out code and start working. This is definitely something we'd need to document better.

How do you handle this? You're working on many different branches, meaning that you obviously need to switch from one instance of FileConveyor to another, meaning that you also have to switch to a different egg?

benoitbryon commented 12 years ago

You're right: we should be able to load fileconveyor's modules without the "fileconveyor" prefix, in a "relative" way: the import statement first checks in the current package, then looks in sys.path.

benoitbryon commented 12 years ago

About installation... several recipe exist. The ones I know are:

copy (or checkout) files, then "python something.py". One drawback is that you have to handle dependencies manually.
system-wide installation with easy_install or pip. Pip and easy_install know how to get the package (they look on Pypi by default). They manage dependencies for you. One drawback is that the installation is system-wide.
copy (or checkout) files, then "python setup.py install". I conjecture it is an equivalent to easy_install a local project.
isolated installation, via virtualenv+pip, buildout or virtualenv+buildout.

The first solution appears to be the simplest... only at first. I began Python with a PHP background and that is what I used to do, it seemed natural. But with little experience (or relevant documentation), you find the alternatives very convenient. I invite you to try virtualenv+pip. It is simpler to understand than buildout, it is a good start, and is enough for most projects.

Yes, creating a Python package requires some additional effort. Here are some advantages:

you can push to Pypi. Then users can install your package with one command like "easy_install fileconveyor" or "pip install fileconveyor". They no longer need to download the source.
you create releases/versions. Then users can install a specific version with one command like "pip install fileconveyor 1.2".
even if you don't push to Pypi, users can install your package. Recipes exist to install a package from an archive (github provides archives) or from source ("pip -e git+https://...").
installers care about dependencies. Users don't have to install dependencies manually.
you can provide scripts. As an example, arbitrator.py may become a script. In a system-wide installation, you will get a system command. In a virtualenv, you will get a local command in virtualenv's bin/ directory.

So, I recommend using fileconveyor as a Python package. On the other hand, I agree it is a good point if it remains simple enough to be used via the old-way "download and run". If we can't do both, I suggest giving a higher value/priority to the package.

benoitbryon commented 12 years ago

About development workflow with a package. Several recipes exist too! Here are some guidelines:

use virtualenv to create an isolated environment. Install your package in the virtualenv (see installation recipes below). The main advantage of virtualenv for development is that you don't affect your system with unstable/experimental/unknown software.
if you checkouted the package manually, you can use "python setup.py develop" to install it as a "in development egg". Then work as usual.
you can use pip's -e option like "pip install -e git+https://github.com/benoitbryon/fileconveyor@packaging-egg#egg=fileconveyor". It checkouts the git repository for you in the (virtualenv's) src/ directory. Then work as usual.
if you use buildout, have a look on the "develop" option or see mr.developer. Then work as usual.

As a conclusion: developing a project is not more difficult when the project is a Python package. In fact, you get additional tools to develop it.

wimleers / fileconveyor

Packaging File Conveyor as a Python Egg #81

Changes description

Expected results

Tests