pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/
92 stars 36 forks source link

FawltyDeps: a dependency checker for Python projects #94

Closed mknorps closed 1 year ago

mknorps commented 1 year ago

Submitting Author: Maria Knorps (@mknorps) All current maintainers: (@mknorps, @Nour-Mws, @jherland) Package Name: FawltyDeps One-Line Description of Package: Dependency checker for Python that finds undeclared and/or unused 3rd-party dependencies in your Python project. Repository Link: https://github.com/tweag/FawltyDeps Version submitted: 0.8.0 Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD Date accepted (month/day/year): TBD


Code of Conduct & Commitment to Maintain Package

Description

FawltyDeps is a command-line tool that gives insights into your Python project's imports, its declared dependencies, and how these match up. The main purpose of FD is to report 3rd-party imports that you have forgotten to declare (undeclared dependencies) and packages that you declare to use but do not import (unused dependencies). To generate this comparison, FawltyDeps reads your code and Jupyter notebooks and using an abstract syntax tree collects imports that come from external sources. The second component is extracting dependencies that are declared in project requirements. Those dependencies may come in various forms: requirements.txt, setup.py, pyproject.toml, all of which FD can parse and extract required package names. The third and most valuable component is a comparison between imports and declared dependencies found in a project. For this various techniques of matching dependencies and imports names are applied, including checking the virtual environment.

To check the project in the current directory run:

fawltydeps

This will find imports in all the Python code under the current directory, extract dependencies declared by your project, and then report undeclared and unused dependencies.

FawltyDeps comes with various ways of customizing input, output and the settings of the execution. You may for example only want to list used imports or include only part of the project to be checked. All details are available in fawltydeps --help.

FawltyDeps may be used for Python 3.7+ and is available via PyPI.

You may read more details in the Tweag blog post.

Scope

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo

Community Partnerships

If your package is associated with an existing community please check below:

[^1]: Please fill out a pre-submission inquiry before submitting a data visualization package.

FawltyDeps supports reproducible workflows by informing of potential problems with undeclared and unused dependencies. Those problems may manifest as a user learning that they cannot run a notebook due to the lack of some package only after running a long experiment setup.

The target audience is Python users who work on libraries/packages that have a chance of being reused later. The scientific application of FawltyDeps is keeping experiments reproducible from Python package perspective. Another application is for scientists who want to rerun old experiments, to check first if no dependencies are missing.

Yes, similar but not the same. Some packages in this area are: pipreqs, pigar, pants, creosote and deptry. FawltyDeps differ from them because it does both - checks used packages (imports in the code) and checks declared packages. The comparison of collected imports and dependencies is done with mapping collected from various sources, by default the user's virtual environment, not a static file. Another difference is the various supported requirements defining formats (pyproject.toml, setup.py, setup.cfg, requirements.txt) and Python inputs (code, notebooks).

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication Options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Confirm each of the following by checking the box.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

NickleDave commented 1 year ago

Hi @mknorps @Nour-Mws and @jherland, welcome to pyOpenSci.

Thank you for your submission. Apologies for the slow reply. Part of the reason for that is we needed to discuss whether it was in scope (just for future reference please know you can open a presubmission inquiry to ask about this sort of thing).

At this time we cannot consider FawltyDeps in scope for our review process.

The main reason is that it is a general development tool, rather than a tool focused on open science. I.e., above you checked "workflow automation" as one of the categories, but by that we mean a tool that focuses on research workflows, like snakemake. This sense of "workflow" might not be clear to everyone, although there is a significant community around it. Happy to hear if this could be better explained in our guide. A secondary reason is that some of the functionality overlaps with existing tools, e.g. flake8 and isort.

You are of course right that "there is more than one obvious way to declare your dependencies in Python" as you say in your blog post. So we understand where you are coming from with developing FawltyDeps. One way we might address this through pyOpenSci would be to help educate scientific developers about dependencies, so they understand the difference between declaring a dependency in a pyproject.toml file vs. adding a dependency in a requirements.txt file. I didn't find anything on this in the FawltyDeps documentation--if it's not there already, maybe you could consider adding some information on it? I like the way it's discussed in this Donald Stufft post: https://caremad.io/posts/2013/07/setup-vs-requirement/.

Generally speaking, though, we cannot consider development tools for review, at least at this time. That's because one of our goals is to help connect scientific Python developers with the broader Python community. For example, we want to increase awareness of all the incredible work being done by software engineers that are creating packaging workflow tools as presented in our guide. Those engineers are working incredibly hard to eliminate the kinds of pain points that FawltyDeps tries to address with an automated tool. I hope you can understand why we would not want to claim that we can review the tools they--and you--are developing, especially after we just went through a very lengthy review process asking for a ton of input from those developers and maintainers of core scientific Python packages.

So, we get where you are coming from, but we need to consider this out of scope for pyOpenSci. Please let me know if this is clear.

NickleDave commented 1 year ago

Hi again @mknorps @Nour-Mws and @jherland, I will go ahead and close now since this is considered out of scope but we're happy to continue the discussion here if needed

mknorps commented 1 year ago

Thank you very much for the time and thought you spent reviewing our submission!

We understand why this project is considered out of scope, you presented it clearly. We hope at the same time that the awareness of issues with governing dependency will grow in the scientific community and that scientists will also find it beneficial to follow general software design principles.

NickleDave commented 1 year ago

We hope at the same time that the awareness of issues with governing dependency will grow in the scientific community and that scientists will also find it beneficial to follow general software design principles.

Couldn't agree more!

Thank you @mknorps @Nour-Mws @jherland for understanding and thank you for the work you are doing on a very hard problem!