ros-infrastructure / bloom

A release automation tool which makes releasing catkin (http://ros.org/wiki/catkin) packages easier.
Other
58 stars 93 forks source link

Is Packages Dictionary Really Necessary for rosdistro #64

Closed eitanme closed 11 years ago

eitanme commented 11 years ago

I'm not totally sure whether this is the right place for this, but here goes. In looking at the new rosdistro file format, I've noticed that the packages list is actually a dictionary as shown below:

ros_comm:
    url: git://github.com/ros-gbp/ros_comm-release.git
    version: 1.9.29-0
    packages:
      roscpp: clients/roscpp
      rospy: clients/rospy
      rosgraph_msgs: messages/rosgraph_msgs
      std_srvs: messages/std_srvs
      ros_comm: ros_comm
      rosbag: tools/rosbag
      rosconsole: tools/rosconsole
      rosgraph: tools/rosgraph
      roslaunch: tools/roslaunch
      rosmaster: tools/rosmaster
      rosmsg: tools/rosmsg
      rosnode: tools/rosnode
      ...

This seems to be unnecessary to me since catkin_pkg allows you to discover where the packages are in a given repository, along with their path information. Encoding it manually like this just leaves room for confusion and error. I'll make the suggestion that the packages list should move to being a white-list of package names that should be released from a given repository. I'll also include a code snippet to find all packages in a directory using catkin_pkg and throw them into a dict with the package name as the key:

import os
from catkin_pkg import packages as catkin_packages

def get_repo_packages(repo_folder):
    paths = []
    #find wet packages
    paths.extend([os.path.realpath(os.path.join(repo_folder, pkg_path)) \
                     for pkg_path in catkin_packages.find_package_paths(repo_folder)])

    #Remove any duplicates
    paths = list(set(paths))

    packages = {}
    for path in paths:
        pkg_info = catkin_packages.parse_package(path)
        packages[pkg_info.name] = path

    return packages

And the distro would become:

ros_comm:
    url: git://github.com/ros-gbp/ros_comm-release.git
    version: 1.9.29-0
    packages: [roscpp, rospy, rosgraph, std_srvs, ...]

Also, why are there some repositories that don't list packages at all? It seems kind of brittle to make the assumption that if the packages field is missing, that correlates to the repository name matching the package name that you would like to include in your white-list. It would be much more explicit to always require the white list or to just do away with the list altogether and require all packages in a repo to be released together... sort of like stacks used to enforce.

Example:

actionlib:
    url: git://github.com/ros-gbp/actionlib-release.git
    version: 1.9.8-0

Should probably be:

actionlib:
    url: git://github.com/ros-gbp/actionlib-release.git
    version: 1.9.8-0
    packages: [actionlib]

Thoughts?

dirk-thomas commented 11 years ago

The problem with the described approach is that it requires to fetch the repo content to answer any further queries. But it is desirable that all package.xml files can be fetched individually in order to use them for whatever task (i.e. building a dependency graph).

Some packages are explicitly not listed since we don't want to build debian packages for them (ussually packages containing tests only).

In order to keep the information the user must put under as small as possible there are some "shortcuts":

eitanme commented 11 years ago

This is fine, but an automated tool could be used to generate the meta-data needed to point other tools at the package.xml files. This is how all the documentation stuff works to track metapackages and cross-references. One tool checks out a repository to document and, along the way, produces meta data to be consumed by others that don't require checking out the full repo. In fact, bloom is kind of the perfect tool to do this since it has to check out the repositories anyways in order to do a release. This architecture guarantees that things stay up to date and removes a source of potential human error. Whatcha think?

On another note, I think that the "shortcuts" are likely to waste more time than they save in that they make it harder to understand the file format. To me, keeping the format as simple as possible, without adding unnecessary edge cases has much more value than saving 15 characters from time to time for users. I'd much rather tell people "list your white-listed packages to be released in the packages field" and leave it at that rather than adding "unless the repo name matches the package name or you have a nested directory structure. In the repo name matching case, just don't fill in the packages field. In the complex directory structure case, you need to list the relative path to your package from the main folder of the repo." The explanation just gets complicated for no reason.

dirk-thomas commented 11 years ago

I think that no automated system can make the decision which packages should be packaged and which should not. That must be decided manually.

Regarding the format: nothing prevents you from always using the verbose format. The shortcut is optional but other people might find that useful and prefer writing less. Imho the shortcuts make these cases much more readable - but this is likely subjective.

One issue currently might also be that the format is not described well in the wiki. But that is something we can easily fix.

eitanme commented 11 years ago

I never meant to imply that the white-list would go away, only that it would change from a dictionary with hard-coded paths to a list of package names. The automated tool (say bloom) would create the mapping between package name and relative path within the repository automatically. That way, users won't have to worry about specifying the relative path to a package manually, and there will be less for them to screw up. See my ros_comm example rosdistro format in my first post for an example of what I'm proposing. (It's the third code block)

The format discussion is a bit subjective, you're right. However, I think that the old rosdistro format provides an example of what can happen when we build extremely flexible specifications. In some ways, it was super useful to be able to specify repository groups, rules, etc. and re-use them throughout the file, but it was a nightmare to make sense of and parse as a human. These two shortcuts aren't as bad as all that, but there's an opportunity to keep this file extremely simple and I'd love to take it.

Whatever you guys decide to go with, documentation is a must. Still, keeping things simple will also help in producing a clear, concise description of the format and how it works. In the end, these things are totally your decisions to make. I'm just hoping to provide some feedback as an early user/tester of the tool-chain about the things I find to be confusing or difficult to use.

tfoote commented 11 years ago

I agree that there is a little bit of redundant information. When doing the release it is easy to verify and warn or error is there is a missmatch. However the availability of this information to other tools is important. And although it would be possible to automate keeping this data in another location. Debugging issues in synchronizing that external data storage could become a big problem, especially if it's never seen by the user/maintainer.

So on balance I think that we should stick with the current structure and probably add the script/feature to bloom which will confirm/warn if there's a mismatch at release time.

dirk-thomas commented 11 years ago

The problem is that bloom can by definition not warn reliably about a mismatch. Any time a new package is added/removed/moved the distro file still contains outdated information since they are only updated after the bloom release. So making bloom warn about that mismatch will be misleading in lots of cases.

eitanme commented 11 years ago

Didn't mean to close the issue, just meant to comment. I get that, perhaps, modifying bloom to generate this information automatically doesn't make sense right now. However, I don't think that there would be much potential for conflict unless a maintainer is releasing his stack twice at the same time. You could just have a paths file per repository stored in a git repo somewhere. The docs stuff works this way and I'm not sure I've seen a merge conflict since I moved to the file per repo system.

wjwwood commented 11 years ago

In the future bloom should open a pull request to the rosdistro repository for you each time you release your package.

So, a compromise might be to have bloom maintain a white-list, which is provided by the user the first release and whenever the package layout changes (new packages, deleted packages, moved packages). Then when the pull request is generated by bloom the map between packages and their sub directories can double as the white list. So in this way the file format remains unchanged and explicit, but it removes some burden from the user, since they don't need to manually maintain that information. So, the user is still specifying the white list but bloom is automatting the generation of the sub-directory meta information.

With regard to the "shortcuts", I think the existing ones are fine, but I agree with @eitanme that the old rosdistro format is an example of how shortcuts can go bad. Many of the entries in that file make literally no sense unless you know all of the tricks. However, if bloom is generating these entries for the users then the shortcuts can likely go away anyways.

eitanme commented 11 years ago

I'm totally on board with @wjwwood. I don't mind the white list at all, but having to manually specify paths is a pain.

I'm also on the same page with your opinion of the file format.

wjwwood commented 11 years ago

There is now diff generation for the rosdistro file as of c36b2d094d7c2c455235e2abafc0d23f880f5e12.

I have ticketed the automated pull request feature here:

https://github.com/ros-infrastructure/bloom/issues/103

I'll close this ticket as it was mostly discussion.