pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/
94 stars 36 forks source link

Phenopype: a phenotyping pipeline for Python #24

Closed mluerig closed 3 years ago

mluerig commented 4 years ago

Submitting Author: Moritz Lürig (@mluerig)
All current maintainers: Moritz Lürig (@mluerig)
Package Name: phenopype One-Line Description of Package: a phenotyping pipeline for Python Repository Link: https://github.com/phenopype/phenopype Version submitted: 1.0.5 Editor: @jbencook
Reviewer 1: @agporto
Reviewer 2: @sdonoughe
Archive: DOI JOSS DOI: N/A Version accepted: 2.0.1
Date accepted (month/day/year): 05/13/2021

Edit: Bumped from 1.0.4 to 1.0.5 since submission.


Description

Phenopype is a high throughput phenotyping pipeline for Python to support biologists in extracting high dimensional phenotypic data from digital images. The program provides intuitive, high level computer vision functions for image preprocessing, segmentation, and feature extraction. Users can assemble their own function-stacks that can be stored in the human-readable yaml-format along with raw data and results, facilitating high throughput and full data reproducibility. Phenopype can be run from Python or from a Python Integrated Development Environment (IDE), like Spyder. Phenopype is designed to provide robust image analysis workflows that can be implemented with little or no Python experience.

Scope

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.

Phenopype is designed to extract phenotypic data (https://en.wikipedia.org/wiki/Phenotype) of plants, animals, and other organisms from images and videos. By processing images in Python, through reproducible code and human readable configuration files, phenotyping becomes higher throughput and more reproducible than established GUI programs like "ImageJ".

Phenopype is intended for ecologists and evolutionary biologists that work with phenotypic data. Phenotypic data are an essential component of ecological and evolutionary research (https://www.nature.com/articles/nrg2897).

Only low level computer vision packages like OpenCV or scikit-image are out there that require a lot of configuring and a relatively deep understanding of computer vision and Python in general. Phenopype offers high level functions so that users can focus on the relevant analytic parts of image analysis.

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements): "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

note: original repo url: https://github.com/mleurig/phenopype

mluerig commented 4 years ago

@lwasser could you let me know whether I should contact some potential reviewers, so whenever you are ready to initiate the review process they are ready as well? thanks

lwasser commented 4 years ago

@mluerig my apologies - in my mind i was waiting for work on this package. let me see if i can find a guest editor for this one so I don't hold things up. i'll get back to you this week so we can kick off the review process!!

UPDATE: i just tweeted to see if we can find a guest editor to move this along!! hoping to hear back!!

mluerig commented 4 years ago

Ok great! In any case, Seth Donoughe (@sdonoughe) and Arthur Porto (@agporto) could be potential reviewers without conflict of interest. I have not yet contacted them.

lwasser commented 4 years ago

awesome. thank you @mluerig !!! if i don't hear back on twitter i'll reach out to some folks directly.

lwasser commented 4 years ago

ok @mluerig i found another person who might be interested as well. @sdonoughe and @agporto are you still interested in reviewing this package? please let me know... i'm still looking for someone to serve as a guest editor. I also have a ping from Ben Cook who i don't have a github handle for yet who may be available for review or editor role!! All -- please let me know if you are available!! i will watch this thread for responses. And i'll keep looking for a guest editor in case everyone wants to review! thank you!!

jbencook commented 4 years ago

Hey All! I'm new here, but I'm available to review if it's helpful. I lead the Applied Machine Learning team at Hudl where we build computer vision services for sports video. I have some background in scientific computing and I'd love to help if I can.

jbencook commented 4 years ago

I'm also happy to serve as editor instead if that's where the need is

lwasser commented 4 years ago

@jbencook ok awesome!! Let's see if our other two colleagues respond!! I can absolutely walk you through serving as an editor. thank you again for being so responsive and willing to help! and welcome to pyopensci!!

agporto commented 4 years ago

Hi everyone. I am happy to review the package. Count me in

lwasser commented 4 years ago

perfect. thank you @agporto ! i'll followup with instructions once.i see if we have a third person or not!! :) so appreciative of you all being willing to work with pyopensci!!

sdonoughe commented 4 years ago

I don't have experience reviewing packages, but I fall squarely into the intended audience/user-set for the package, so I am happy to participate.

On Tue, May 19, 2020 at 2:16 PM Leah Wasser notifications@github.com wrote:

perfect. thank you @agporto https://github.com/agporto ! i'll followup with instructions once.i see if we have a third person or not!! :) so appreciative of you all being willing to work with pyopensci!!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pyOpenSci/software-review/issues/24#issuecomment-630993673, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADK6FA6DU3HWATVS6VTOGNTRSLEJLANCNFSM4MY5HIMQ .

lwasser commented 4 years ago

👋 there @sdonoughe sure... tell me a bit more. do you have enough programming background that you feel comfortable reviewing with another person as a team? You could also test out the functionality and documentation whereas @agporto could focus on the code / back end. please just let me know what works best for you and if you have questions you can also email me directly at leah.wasser@colorado.edu !! thank you for being willing to participate.

lwasser commented 4 years ago

ok everyone... i think we are in good shape here. @jbencook if you are still open to serving as a guest editor, that would be great. I will be out next week. While i am out, @jlpalomino has agreed to answer any questions that you have about the review process. We just updated our contributing guide so i'm hoping better instructions will be available to you ONCE we figure out why github actions is acting funny when trying to deploy the updates!

@agporto your review is much appreciated! if you can focus more on the technical end of things - code, tests etc that would be great! @sdonoughe if you can please focus on the usability and docs + applications of the package for your needs / and general use cases in the community that would be excellent. I am working on also finding another person to support this review so you can ask questions as you go!! that can be me once i'm back on june 1.

@mluerig i just also want to confirm that you are NOT interested in submitting this to JOSS. i noticed that was not checked. This process requires that you write a short paper. but JOSS accepts our review once your package is approved so it is not a challenging additional step.

@jbencook the next steps are for you to do the following following our guidelines

All please look at our documentation here: https://www.pyopensci.org/contributing-guide/ we are working on improving it so any feedback is much appreciated!!

Please all get in touch with any questions or concerns (knowing i'll just be offline for one week if you need feedback from me!). Thank you ALL for supporting pyopensci!!

jlpalomino commented 4 years ago

Thanks @lwasser for providing @jbencook with the next steps. Happy to answer any questions about the editor process, so feel free to reach out!

While we wait for the updates to the guideline docs to render, we can see the latest version here.

jbencook commented 4 years ago

Editor checks:


Editor comments


Reviewers: @agporto, @sdonoughe Due date: 2020-06-19

jbencook commented 4 years ago

Thanks @lwasser and good to meet you @jlpalomino!

I just added the editor checks. @agporto and @sdonoughe does 19 June work for the review deadline? I'll start watching the repo and work on my checks.

Also, @mluerig I noticed the repo has an LGPL license, but the README says MIT. Looks like both are approved OSI licenses, but they should probably match.

Looking forward to working with you all!

mluerig commented 4 years ago

@lwasser @jbencook @jlpalomino yupp correct I am NOT planning to submit to JOSS, but to Methods in Ecology and Evolution. The manuscript is already written and can be made available to anyone involved in the review process, if that is needed and helpful

mluerig commented 4 years ago

@jbencook I changed it to LGPL in the readme

lwasser commented 4 years ago

hi all!! @jbencook just a note that i have slowed the process down here as we are looking for someone to support the second review. With pyopensci we want to mentor folks through reviews if it is their first. i however was out last week and i hadn't found someone prior to being out. So i'm on the hunt again and we will get back to you with a timeline for the second review! if you know of any people who have experience with code reviews that could support a second review please do say the word.

lwasser commented 4 years ago

Update: I have found a person to support the second review and it looks like the second review can be done next week! we are in good shape still - cheers all! :)

jbencook commented 4 years ago

I've completed the editor checks - I think now we're just waiting on the reviews. How are things coming @agporto and @sdonoughe? Any questions so far?

agporto commented 4 years ago

@jbencook I have started working on it but still have a lot of ground to cover.

agporto commented 4 years ago

Hi everyone. I am going to need a few more days to finish the review (I expect Tuesday next week). I apologize for the delay. The source code of the package is more extensive than I initially thought.

sdonoughe commented 4 years ago

Hi folks, I've also been delayed, but aiming to conclude by the end of next week.

On Fri, Jun 19, 2020 at 4:24 PM Arthur Porto notifications@github.com wrote:

Hi everyone. I am going to need a few more days to finish the review (I expect Tuesday next week). I apologize for the delay. The source code of the package is more extensive than I initially thought.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pyOpenSci/software-review/issues/24#issuecomment-646865599, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADK6FA2TCP5RJPWZ2ZMI6PLRXPJQJANCNFSM4MY5HIMQ .

agporto commented 4 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

Estimated hours spent reviewing: 12 hours


Review Comments

I have now reviewed the package entitled “Phenopype: a phenotyping pipeline for Python”. This package represents a tremendous amount of work, especially considering that it has a single author/contributor.
I strongly support its publication at pyOpenSci. The python code is clearly written and easy to follow. I believe it will be a critically important package for biologists interested in computer vision, because it is not only comprehensive but also offers a lot of flexibility in terms of usage. I have only a few comments with regards to the usage and code.

General suggestions

The documentation is quite extensive and well organized, but I think one thing that would help new users is moving the Tutorial on GUI interactions (currently, Tutorial 5) to the beginning. This tutorial explains in more detail how to interact with OpenCV's High GUI module. In my experience with the current package, this module was by far the hardest one to interact with. It is somewhat unstable, crashing the jupyter kernel with some frequency, and its keystrokes are not intuitive. Given that this module is important in mediating user interactions with Phenopype, I think it would be useful to have a tutorial on it front and center. It might prevent new users from getting overly frusfrated and ending up abandoning the package. In the long run (beyond this review), it might be worth considering other options for GUI interaction.

Another aspect that might need some attention is the standardization of the root directory. When going through the tutorials, perhaps it would be useful to explicitely set the root directory, since the root directory jumps around a little across the different tutorials (sometimes it starts on one place, sometimes on another)

Tutorial and Examples - Code Errors

When running Tutorial 2 and going through the 'Low throughput workflow' I get the following error:

~/phenopype/utils_lowlevel.py in _auto_line_width(image, **kwargs)
    642 def _auto_line_width(image, **kwargs):
    643     factor = kwargs.get("factor", 0.001)
--> 644     image_height, image_width = image.shape[0:2]
    645     image_diagonal = (image_height + image_width) / 2
    646     line_width = max(int(factor * image_diagonal), 1)

AttributeError: 'NoneType' object has no attribute 'shape'

Based on an inspection of the source code, it seems that, when first initializing a container and trying to draw a contour, no image is getting assigned to container.canvas. As a consequence, any attempt to extract the image parameters (height, width) from an empty object leads to the error above. It could be an easy fix, but my attempts at tracing it back have not yielded anything other than what is described above.

Similarly, when running Tutorial 5 and going through the polylines example, I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-a3de9e316ea9> in <module>
      2                                  line_width=2,
      3                                  line_colour="green") 
----> 4 canvas = pp.visualization.draw_polylines(canvas, df_polylines=df_lines,
      5                                          line_width=2,
      6                                          line_colour="green")    

TypeError: draw_polylines() got an unexpected keyword argument 'df_polylines'

This one seems more straightforward, since the draw_polylines function definition does not contain the keyword in question. So, my assumption is that some recent changes to the code have removed the need to explicitly define df_polylines and the tutorial has not been updated to reflect those changes.

Code architecture - Suggestions

I have a suggestion in terms of code architecture that I do not expect the author to address in this review (but the author might find it useful in the long run). As of now, phenopype makes an explicit choice with regards to how to store data (both raw and processed data). Whether working with masks, polylines, landmarks, etc, phenopype will store that information (most of the time) in human-readable csv files or similar. It will also store different types of information (e.g., landmarks or masks) in different files using different functions. While this makes accessing the data for each specimen quite easy and in human-readable format, it also creates a lot of code duplication due to the need of having one function for each type of annotation. Given that this package will likely become a reference for computer vision in ecology and evolution, it will also set data standards for future work. Having followed the success of the COCO (https://cocodataset.org/#home) data format, I wonder if phenopype would benefit from having a similar architecture for data storage. By storing different types of annotations in a single JSON file, COCO has created an almost universal file format that makes it remarkably easy to train different machine learning algorithms using the same dataset (whether trying to predict landmarks or masks, for example). Again, I do not expect this change to occur, but I think it might be worth considering in the future.

Final remarks

In general, this is a well written and user friendly python package. My comments here are purely to provide the author with some (hopefully) useful feedback.

Other minor comments
sdonoughe commented 4 years ago

Hi again, the document and tutorial materials are extensive and impressive! As such it's taken me longer than anticipated to review all of the components. Will conclude soon.

mluerig commented 4 years ago

Hi all. I just realized that I probably should have provided this information at the beginning to make your lives a little easier: I wont be working for the next couple of weeks because of a long planned vacation. it's not yet clear for how long, but probably until early August. Sorry about not communicating my timeline earlier, I hope this is ok for everyone.

mluerig commented 4 years ago

ok I'm back and ready to go - hope you all are having a good summer :-)

lwasser commented 4 years ago

hey @mluerig i'm just checking in here. what is the status of this review. are you still working on edits or where are things here?

mluerig commented 4 years ago

hey @lwasser - I have been in touch with @sdonoughe because he had some problems getting the program to work on a mac (I programmed and tested it only on windows and ubuntu). I think it's working now, and he is back to the actual review of documentation and tutorials. I am not sure how far along he is though.

In general, how are code-reviews typically addressed? I already fixed two bugs that @agporto spotted, but I will also streamline the tutorials according to his suggestions. Should I let everyone know when a new version is ready to be reviewed?

In that context a potentially complicating issue is that since my submission to pyOpenSci, I continued development and released new versions that have some new features. This was for ongoing projects that needed these features. It's nothing fundamentally new or complex, I just extended the palette of functions that phenopype has, the basic principle and functionality is still the same. Should a second round of review just take a quick pass at the new features?

lwasser commented 4 years ago

hey there @mluerig this is a good question. it sounds like it might make sense for @sdonoughe to do another pass at using the package (if there is time - please let us know). if you have released new versions that have both fixes and enhancements i think for us to approve the package we will need to ensure that version has been reviewed. so let's see if @agporto and @sdonoughe have time to do another review on the current version. i think a quick pass to ensure things work as intended is sufficient. thank you for the speedy reply!

mluerig commented 4 years ago

ok great, but I would suggest @sdonoughe finishes the review on the current version. then I can address both his and @agporto's remaining comments together, and release another, post-review version

a second round of reviews on this version could then focus on the parts that received comments in the first round, and that are new as per the continued development. together with completed checks and CI on all new features, would you then be able to approve the package, given that there are no more comments?

hope that makes sense...

lwasser commented 4 years ago

@mluerig absolutely. whatever process that is the most efficient is the one we should proceed with! Our main concern at pyopensci is that whatever package version we stamp as approved does need to be reviewed in its entirety. The workflow that you propose above sounds reasonable if both reviewers have bandwidth!

agporto commented 4 years ago

@mluerig @lwasser that plan works for me!

lwasser commented 4 years ago

perfect!

sdonoughe commented 4 years ago

This sounds great to me. A quick update on my end. As @mluerig mentioned, I've been trying to get the package working on MacOS. As best I can tell it is an issue with system-specific differences with how OpenCV takes user input, which breaks parts of phenopype's workflow. @mluerig has submitted a bug report about this, I believe, and so far there hasn't been a fix. I am now seeing if there is a trivial fix available by tweaking the user input steps in the phenopype code itself. I will work on this for a little longer and if I can't get it working, I will just finish up my review on the rest of the package and on the tutorials.

sdonoughe commented 3 years ago

Package Review

Relevant disclosure: My written review of this package has taken a very long time for me to submit (~8 months from the time I was invited). Towards the end of that period, I contributed as a minor author to an as-yet-unpublished literature review manuscript on the topic of computer vision for biology research. This author of phenopype is one of several other co-authors on that manuscript. Since I began this review without being acquainted the author, and since I did the bulk of the code review before we began to interact as co-authors, I do not consider this to be a conflict of interest. But in the interest of full disclosure, I mention it here.

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Reviewer notes on README:

Brief demonstration usage: There currently isn't a demonstration usage directly in the README, but there are, of course, lots of links to demonstrations. If a demo is needed here, perhaps the best would be have code that opens an image, takes user mouse clicks to select a region, and then does, say, a threshold operation on the contents of that region, then displays and saves the results.

How the package compares to similar packagese: I think it would be useful to have an explanation for how this package relates to the most widely used Python image manipulation packages, and why a user might want to use phenopype in combination with one or more of those packages. As for the graphical interface for taking user input interactively, the closest Python tool that I'm aware of is napari, even though it is still in alpha. For users that do not have much coding experience, another option for them is to use Fiji/ImageJ, and then record a macro of their image processing steps. Here, phenopype offers functionality that is not present in any of these alternatives, so it would be useful to point that out too.

Citation information: If/when this package is described in a journal article, then that citation can be added, but even in the meantime I can see biologists wanting to try phenopype out and potentially even using it in publications. So I'd definitely recommend having a "how to cite phenopype" section.

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Reviewer notes on Usability: All the functions are well-documented in the code base itself, which is great. The API reference page is also quite helpful.

Functionality

Reviewer notes on Functionality: I was able to assess most of the functionality of package, but there were a few minor things that I was unable to check. The provided examples and the other most common use-cases I can imagine are not particularly computationally intensive; so it makes sense that there aren't any specific performance claims. I did not assess any automated tests, nor continuous integration, nor compliance with packaging guidelines (also, a sidenote to the Editor: that link in the Review template to packaging guidelines is broken)

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 22


Review Comments

Note: My aim in this review was to focus on the scope, usability, documentation, tutorials, and user-facing functionality. I have not attempted to assess the clarity, organization, and performance of the code itself.

Aim/scope of the package

In many areas of biology search, a common challange is to take an image of one or more organisms, then measure some traits of those organisms. Such traits could include: the number of individual organisms; or a given organism's length, color spectrum, orientation, area of its spots, etc. After aquiring the raw images, this process entails many substeps:

phenopype is intended to make all of these steps easier.

How does this package fit into an image analysis workflow?

One might wonder: why wouldn't a biologist just use opencv by itself? Or scikit-image, or any other Python library of image processing tools, for that matter? Answer: Because phenopype is aiming to augment, rather than replace, the utility of existing CV libraries for scientists measuring phenotypes. Put differently, phenopype is not a library of granular image processing functions. Instead, it is a set of wrappers around some of opencv's most broadly used functions, combined with a simplified GUI for taking user input; these collectively act as a scientist's bookkeeper during an image processing project. In doing so, it fills a major void in the scientific Python toolkit. As far as I know there is no other Python tool that enables the user to create a pipeline of image-processing steps that incorporates manual input along the way. phenopype is also (to my knowledge) unique in defining a terse, yet legible format for recording and replicating the parameters of such a pipeline.

One side effect of the decision to rely on opencv, unfortunately, is that its image viewer functionality is currently buggy for me on MacOS. I include a truncated account of my efforts to get that part of the package to function correctly below under the Installation section.

There are some tradeoffs involved in structuring the package this way. For one, the set of image-manipulation functions is impressive, but it understandably covers only a small subset of the image-parsing tasks that are regularly encountered in image analysis work for biological data. This is not a knock against phenotype in the slightest. But a user that wanted to integrate additional image-processing functions into the middle of a workflow would need to invest some time to work out how phenopype works under the hood if they wanted to take advantage of the replicability tools that are the core of the package.

The upshot of this is that a user without much coding experience would probably need to restrict themselves to analyses that can be composed entirely of the functions that are folded into phenopype itself already, but as the tutorials and examples show, this includes a fairly wide range of potential analyses.

Installation

Notes on the Installation page itself:

On MacOS, I came across several issues with the opencv image viewer windows that are used as a part of the phenopype workflow:

  1. When opening an opencv image viewer window in general, there is an error and warning:

    Class RunLoopModeTracker is implemented in both /Users/Seth/anaconda3/envs/pp/lib/python3.7/site-packages/cv2/.dylibs/QtCore (0x11500e7f0) and /Users/Seth/anaconda3/envs/pp/lib/libQt5Core.5.9.7.dylib (0x1282d1a80). One of the two will be used. Which one is undefined. QObject::moveToThread: Current thread (0x7fdc9d636fd0) is not the object's thread (0x7fdca22351f0). Cannot move to target thread (0x7fdc9d636fd0)

    It continues:

    You might be loading two sets of Qt binaries into the same process. Check that all plugins are compiled against the right Qt binaries. Export DYLD_PRINT_LIBRARIES=1 and check that only one set of binaries are being loaded. QObject::moveToThread: Current thread (0x7fdc9d636fd0) is not the object's thread (0x7fdca22351f0). Cannot move to target thread (0x7fdc9d636fd0)

  2. Then, once Issue (1) was solved: When using a jupyter notebook, when I attempt to view an image in an opencv window, the window opens, but it accepts no mouse input to zoom or scroll. As soon as I press any key, I get a spinning ball and a frozen window, and then I have to force quit and the kernel crashes.

  3. Also, once Issue (1) was solved: In typical usage, phenopype image viewer windows are supposed to allow the user to zoom and scroll. The zooming and scrolling functionality is not present for me, regardless of how I run the code. (Actually, at the very beginning of my code review I was able to briefly open zoomable/scrollable windows, but now I cannot replicate that condition, despite trying a fairly wide range of approaches)

In the end, I was able to solve issue (1), more or less, but the other issues remained. It was possible for me to run the tutorial and example code from the command line or from standalone scripts. This was fine for the purposes of doing the review. The rest of my review continues below after this section. In case it would be of use for someone better-equipped than myself to solve this issue, here some things I learned about these issues:

Overall documentation and user experience

In general, I found phenopype to be very well-documented, and I was impressed to see author has put together such a detailed set of tutorials and examples to illustrate how the package can be used. The package could stand by itself without so many detailed walkthroughs, however it is a big service to the community that there are walkthroughs pitched at novices. I recommend "publication" of the package regardless of how the tutorials are updated, but if the author wants to polish the tutorials further, I have suggestions about their content and clarity. In the interest of keeping this post from becoming even more unwieldy in length, I will post those suggestions separately later today.

sdonoughe commented 3 years ago

Notes and suggestions for phenopype tutorials

For the rest of my review, I've worked through the tutorials. Here are a set of suggestions, from minor typos to ideas about how to update the organization.

Tutorial 1

Tutorial 2

Tutorial 3

Tutorial 4

Tutorial 5

Tutorial 6

Final thoughts

On the whole, these tutorials are comprehensive and careful, with a clear focus on making the tool accessible to people who are relatively new to coding in Python. The images and GIFs to illustrate the GUI are especially helpful.

mluerig commented 3 years ago

@sdonoughe and @agporto, thank you so much for those extensive and careful reviews - this is extremely helpful and I am eager to implement your suggestions! Here are some general responses to the issues and a roadmap for how I am planning to address them - @jbencook, @lwasser and @jlpalomino would be great to get your opinion on those, too:

GUI

As I feared, the opencv GUI seems to be an issue for mac users, but can also potentially be unstable for windows/unix users. Since phenopype is built around opencv and its GUI, I probably can't do much more at this point other than trying to reduce internal sources of instability, better documentation of some of the potentially surprising behavior, and make it clear that phenopype currently doesn't stably run on macOS. Especially the latter is unfortunate, but currently the only solution I can think of, given that phenopype runs mostly stable on windows/unix.

Tutorials

I looks like the tutorials need a bit of reorganizing, and both of you have given a lot of specific suggestions on how to do that. In addition to those specific changes, do you think it would make sense to move the tutorials (not the documentation) do a separate repo (e.g. "phenopype_tutorials") that is more self-contained and that people can just fork by itself? I am not sure what the best practices are, but as I was writing the tutorials and examples I felt that I am cluttering up the main program repo, so maybe a separation is useful

Readme

I will add some citation information, and a comparison to other packages that @sdonoughe pointed at in his review. A statement that, at its core, phenopype is mostly a wrapper for other, more low-level CV libraries might also be helpful to users. Additionally, if people feel that this is needed, it I could add a little demonstration in the readme before pointing to external documentation and the tutorials.

So, I will work on this over the next few weeks and try to have a new version ready by the end of march or sometime in April.

@agporto and @sdonoughe, would you be available for a second round of reviews?

@jbencook, @lwasser and @jlpalomino, does all of this make sense or have I forgotten something?

sdonoughe commented 3 years ago

Yes, I am available for a second round of review, and without the big time delay.

agporto commented 3 years ago

I am available as well

jbencook commented 3 years ago

I think that plan makes sense @mluerig. Re: MacOS, I think it's very reasonable to not support GUI functionality as long as it's clear in the documentation.

mluerig commented 3 years ago

Hi again!

So, below are the responses to your comments, I have updated phenopype accordingly, it is now version 2.0.0. Since v1.0.5, which was the one I originally submitted, I have changed a few things that how users the yaml configuration files (I moved to list structures for greater flexibility when configuring the high throughput workflow). Therefore, it is not backwards compatible and thus got a major version update. I also added some minor code changes in the background that improve code readability, but should not affect user experience. All changes are documented in the changelog.

I modified what I could, but some things (failing on macOS, HighGUI instability, changing how annotations are stored) were beyond the scope of these revisions (as both of you already acknowledged in your reviews). However I'd be very happy to have a chat with you about what can be done in the future.

Below I reply to your comments - hope they make sense, otherwise let me know.

Arthur

The documentation is quite extensive and well organized, but I think one thing that would help new users is moving the Tutorial on GUI interactions (currently, Tutorial 5) to the beginning. This tutorial explains in more detail how to interact with OpenCV's High GUI module. In my experience with the current package, this module was by far the hardest one to interact with. It is somewhat unstable, crashing the jupyter kernel with some frequency, and its keystrokes are not intuitive. Given that this module is important in mediating user interactions with Phenopype, I think it would be useful to have a tutorial on it front and center. It might prevent new users from getting overly frusfrated and ending up abandoning the package.

I followed your (and Seth's suggestion) and rearranged the tutorials: tutorial 2 now covers GUI interactions. I completely overhauled this tutorial and I now also discuss at the most common HighGUI issues in OpenCV, so that users are prepared and can try the workarounds I suggested. Indeed, in the long term I will have to consider alternatives to the OpenCV GUIs.

Another aspect that might need some attention is the standardization of the root directory. When going through the tutorials, perhaps it would be useful to explicitely set the root directory, since the root directory jumps around a little across the different tutorials (sometimes it starts on one place, sometimes on another)

This should be fixed: every tutorial now assumes the folder tutorials in the downloaded phenopype-master folder as root.

When running Tutorial 2 and going through the 'Low throughput workflow' I get the following error:

fixed: https://github.com/mluerig/phenopype/issues/7

Similarly, when running Tutorial 5 and going through the polylines example, I get the following error:

fixed: https://github.com/mluerig/phenopype/issues/8

Having followed the success of the COCO (https://cocodataset.org/#home) data format, I wonder if phenopype would benefit from having a similar architecture for data storage. By storing different types of annotations in a single JSON file, COCO has created an almost universal file format that makes it remarkably easy to train different machine learning algorithms using the same dataset (whether trying to predict landmarks or masks, for example). Again, I do not expect this change to occur, but I think it might be worth considering in the future.

This is something I would love to chat about. I had a look at the COCO json files, and I think I would like to follow a similar, albeit not identical structure (I think phenotypic data would need to have somewhat different structure than pure segmentation labels).

Seth

Brief demonstration usage: There currently isn't a demonstration usage directly in the README, but there are, of course, lots of links to demonstrations. If a demo is needed here, perhaps the best would be have code that opens an image, takes user mouse clicks to select a region, and then does, say, a threshold operation on the contents of that region, then displays and saves the results.

I followed your suggestion and added a gif to the readme that shows some basic image processing operations and the results in real time.

How the package compares to similar packagese: I think it would be useful to have an explanation for how this package relates to the most widely used Python image manipulation packages, and why a user might want to use phenopype in combination with one or more of those packages. As for the graphical interface for taking user input interactively, the closest Python tool that I'm aware of is napari, even though it is still in alpha. For users that do not have much coding experience, another option for them is to use Fiji/ImageJ, and then record a macro of their image processing steps. Here, phenopype offers functionality that is not present in any of these alternatives, so it would be useful to point that out too.

I used some of what you wrote below (actually, almost one-to-one, so you should get credit) in the review to point out that phenopype is a wrapper for OpenCV in combination with a project management ecosystem.

Citation information: If/when this package is described in a journal article, then that citation can be added, but even in the meantime I can see biologists wanting to try phenopype out and potentially even using it in publications. So I'd definitely recommend having a "how to cite phenopype" section.

I added citation instructions

One might wonder: why wouldn't a biologist just use opencv by itself? Or scikit-image, or any other Python library of image processing tools, for that matter? Answer: Because phenopype is aiming to augment, rather than replace, the utility of existing CV libraries for scientists measuring phenotypes. Put differently, phenopype is not a library of granular image processing functions. Instead, it is a set of wrappers around some of opencv's most broadly used functions, combined with a simplified GUI for taking user input; these collectively act as a scientist's bookkeeper during an image processing project. In doing so, it fills a major void in the scientific Python toolkit. As far as I know there is no other Python tool that enables the user to create a pipeline of image-processing steps that incorporates manual input along the way. phenopype is also (to my knowledge) unique in defining a terse, yet legible format for recording and replicating the parameters of such a pipeline.

I actually think that this text gives a nice summary of what phenopype is providing, and how it is distinct from other packages. I took the liberty of using some of your formulations, and incorporating them into the readme.

There are some tradeoffs involved in structuring the package this way. For one, the set of image-manipulation functions is impressive, but it understandably covers only a small subset of the image-parsing tasks that are regularly encountered in image analysis work for biological data. This is not a knock against phenotype in the slightest. But a user that wanted to integrate additional image-processing functions into the middle of a workflow would need to invest some time to work out how phenopype works under the hood if they wanted to take advantage of the replicability tools that are the core of the package.

So, I wonder if, in the long term, there might be a better way of wrapping OpenCV (and potentially scikit-image) functions, maybe even through means of a toolchain that does this in an automatic fashion. Would be great to hear your and @agporto's thoughts on this.

Notes on the Installation page itself:

I decluttered and streamlined the installation instructions:

On MacOS, I came across several issues with the opencv image viewer windows that are used as a part of the phenopype workflow:

I am currently not able to solve this, but in the long term I am planning to make phenopype usable on all platforms (including macOS) by either fixing the OpenCV GUI issues OR by finding other means of visualization and user interactions (napari for instance looks very promising).

Until then, to make it clear that phenopype doesn't run on macOS, I added a little section to the readme and documented your efforts by transferring them to a phenopype issue


Tutorials

Note that tutorial 5 got moved to position 2 - I commenting here in the new order

Tutorial 1

It is very thoughtful that this tutorial is here to introduce Python, and that there are also pointers to other introductions. My sense is that this intro might feel somewhat bewildering to a beginner. That is potentially okay, of course, since new things are almost always confusing. But I suspect the best way to make this tutorial maximally accessible would be to recruit an actual beginner and have them spend an hour or so trying to work through this section and then have them tell you how comfortable they feel about moving on to the next one. Since I'm not a beginner, I can't precisely point out where the hurdles are.

I added a little balloon to ask for feedback.

There are several references to the "Resources" page. This is great! But there are a so many links there and it is not clear where someone should start. I suggest that, in general, the author take a slightly more authoritative tone in introducing material. "If you are new to Python, do [THIS] beginner overview first, then come back." Then next: "If have not done much image processing with code, read through [THIS] walkthrough second." And finally: "Okay now you are ready to begin with the phenopype tutorials."

Fully agree - I added a Further Reading balloon to this and all other tutorials (and will do so for the examples).

I was somewhat puzzled as to the usefulness of opening lots of images at once. That is, since I couldn't think of why a user would want to do that, it might be better to demo a different functionlity as a way to illustrate the principles that are being taught for that section.

I sometimes like to look at several images to look for differences in traits, or to compare different thresholding outcomes - I thought it was useful, but if you really think it's confusing I can remove it.

I expected a link at the bottom of Tutorial 1 that takes me to Tutorial 2, but ultimately found one in the top nav bar.

fixed - now all tutorials have links to the next one

Tutorial 2

Consider putting this tutorial earlier in the series. Say, right after tutorial 1? I would have found it helpful to get these GUI basics before coming across image processing steps that involve the GUI in the other tutorials.

done - Tutorial 5 is now the second one

Also, my personal opinion is that the GUI---and the capability to do semi-automated analysis that includes human interaction at more or more steps---is probably going to be the part of phenopype that will be most exciting and potentially useful to the research community, since it is a lot harder to build one's own GUI than it is to record a sequence of image transformations. I could be wrong about this, but if I'm right, maybe consider that a bit when describing the package on its landing page.

good point - I mentioned it now in the reworked readme

Tutorial 3

I cleaned this tutorial up a bit, i.e. I cut back on intro-text, removed the "morphology" operations, I added instructions for what the expected interactions are, and streamlined the end about yaml and pype (pype-behavor section got moved to API)

"Fig. 2: Schematic of Phenopype’s prototyping workflow.": This figure only includes some of the functions that are in the code block immediately beneath it. I wondered: "what happened to the morphology operation?"

I removed morphology operation to keep it simple

Under Figure 2, in cell 10 there is an example sequence of image manipulation operations. As the user, I naturally ran the code, but there weren't any prompts in the tutorial or in the terminal telling me what to do. I think the reader needs something like:

I added some instructions at the very beginning to show users what they have to do

Note to the author: I had to press CTRL twice to end a selection, but this does not appear to be the intended behavior from the docstring of the create_mask function. In general, in my hands, the opencv keystroke codes occasionally vary across operating systems.

Since for this example only one polygon-mask is needed, I changed the text to "finish and close with Enter" to keep complexity low. Multi-polygon masks are also covered in Example 5

For the Low-Throughput section, the output from one code block for me didn't match what was shown in the tutorial. This was the one:

should be fixed

The contours.csv file was present in the _temp directory, but it looks like it isn't being found?

should be fixed

In the High-Throughput section, tell the user explicitly to press 'y' to make a new YAML file (I can see it in the terminal input/output lines, but I'm just trying to make it more approachable to a newcomer).

this has changed

The section on the pype is nice and detailed, but it feels like a lot to append to the end of this tutorial. I think the reader would be better served by splitting this tutorial into two, perhaps? It looks like this pype section sort of serves as an overview/preview of some of the material that also gets covered in the subsequent tutorial. That is fine! It just seemed like sort of a lot to digest in one go.

agreed - I sourced it out to the API. I first thought to make another tutorial or section only for the pype, but I think it's better to keep all relevant information in one place, so the API made more sense to me. let me know if this helps or whether it complicates things.

This behavior did not occur for me: "Editing and saving the opened configuration file will close the image window, run the functions in the control file, and show the updated results." I think this is possibly due to the ongoing misbehavior of the opencv image viewer window.

I think this is another macOS problem - sorry :-)

Tutorial 4

I imagine most readers would run the notebook from within the tutorials folder, which sets that as the working directory. It might confuse the reader that the working directory gets changed, which then breaks the relative path to the images dir in the tutorials directory. Is it necessary for the _temp folder to be where it is? Could it be in the tutorials folder itself, so that the reader doesn't have to update the path, and you could just go with the same relative paths for everyone?

fixed - now all tutorials and examples start from phenopype/tutorials, and save output under phenopype/tutorials/_temp

The block of text that begins "The remaining settings..." is pretty dense. I recommend using bullets or a table to organize it.

fixed

All the embedded images that illustrate how phenopype works are much appreciated. It is nice touch. The black windows in "Step 2", however, are hard to read. That is, it's not clear what those represent.

I added some text to show that they are the YAML config files

This cell runs indefinitely, even after I close the text file that gets opened automatically:

Those functions got reworked ... hope it works now

In the section on loading an existing project, again I had to update the path given in the tutorial, because it seems like the tutorial expects me to have the tutorial notebook to be running in a different directory? Or at least expects the working directory to be a different one. Here I had to update to:

should be fixed

Tutorial 5

I recommend omitting the option for users to load the previous project or make a new one, since it is a bit confusing with the commented out new directory line. Better, I think, to just have a minimal code block that generates an example project for this notebook

agreed, and done!

Again, the tutorial expects that we are working in a different directory than the one that actually has the ipynb files, I think? In the cell that follows this line: "We will use the image stickleback_side.jpg from the image folder in tutorials"

no, the images are just a subfolder - the working directory shouldn't move anymore

Specifically: when I run the function, I can click the two points for setting a reference length. Then I can enter a number (by the way, I would update the prompt to include the requested measurement units). But then it is not clear what I am I supposed to do next. The tutorial says to draw a rectangle around the scale image, but when I try to click on the image again the terminal prints: "already two points selected", and I don't know what to do next. Then pressing a key brings up red text on the image, and the terminal prints "- add column length". It is not clear what this means. I couldn't tell, in any case from the tutorial or the add_scale docstring. Bottom line: I think it would help to hit this function with a coat of polish, so to speak, and put a little more explanation in the tutorial and in its docstring.

I think this may have been a bug I fixed in the meantime - should be working now

Tutorial 6

This is super cool. My impression is that there are several other published tools out there for specifically tracking animals in arenas (used for fish, flies, ants, birds, migrating animals, etc). This review is seven years old, but it suggests that a lot options were available. Maybe all the alternatives are too hard to use, or not open, or whatever, but still I think it would be helpful to explain how this tracking tool fits into the existing ecosystem.

I agree that the tracking module at this point seems more of an appendix. My goal is that, at some point, all the core processing functions can be used on objects found in every single frame.

lwasser commented 3 years ago

@mluerig thank you for this thoughtful response. I will come back to it shortly so i can really read it in more detail.!! I just wanted you to know that it's on my radar. More later.

lwasser commented 3 years ago

@agporto @sdonoughe can you please have a look at @mluerig responses to your review above? are you satisfied with the responses? the one issue i see is the MAC compatibility issue. i am going to see how ropensci handles these types of issues in a review and will respond accordingly. That aside are you both content with t he review r esponse above OR are there additional items that you'd like to see addressed? my apologies for the huge delay - a lot has been going on here locally (not work related) and it's really thrown me off.

sdonoughe commented 3 years ago

Three main notes

  1. The reorganization and revision of the tutorials is very impressive. Even before my review they were already quite clear, and now they're even better.
  2. As before, I couldn't get the image windows to behave as intended on MacOS, so I wasn't able to actually test all the functionality.
  3. That said, I read through the tutorials, and I found them very thoughtfully structured and explained. All I found were a few miscellaneous typos:

Tutorial 1

I think this is a perfectly pitched intro.

Tutorial 2

Overall, I find the updated tutorials to be extremely thorough and helpful, and this one in particular.

A small typo: "Currently Phenopype uses the standard HighGUI libraries that ship with the most recent precompiled opencv-contrib-python package that is listed on the, which is Qt for Linux and macOS, and Win32 UI on Windows."

Something is missing after "listed on the"

Tutorial 3

Small typo: "To learn how to analyze entir data sets with the high-throughput method, move on to the next Tutorial 4."

Tutorial 4

Small typo: "initiatlized"

Tutorial 5

I found this very useful! I think it would be helpful to add a small summary bit at the end, like you've done for the previous tutorials.

Tutorial 6

Small typo: "exlcude" Small typo: "applied matters The following"

Bottom line

@lwasser I'm totally content with the review response.

Congratulations on a comprehensive and useful tool, @mluerig!

agporto commented 3 years ago

@lwasser I found a few issues with the examples, but @mluerig has already fixed them. I am completely satisfied with the responses provided by the author. This is an impressive tool and I hope biologists will rejoice.

mluerig commented 3 years ago

thanks @sdonoughe and @agporto - I have released a new version (2.0.1) that addresses the typos and issues you mentioned.

mluerig commented 3 years ago

pinging @jbencook and @lwasser. both reviewers are happy - so, what comes next? I would be very grateful if we could conclude the review process soonish, given that it started almost a year ago!

lwasser commented 3 years ago

hi @mluerig To begin - congratulations - 🎉 phenopype is now a part of the pyopensci ecosystem!

Next steps:


🎉 Thank you @jbencook @sdonoughe @agporto for supporting this review process! 😸

There are a few things left to do to wrap up this submission:

All -- if you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and our documentation in the contributing-guide. We have also been updating our documentation to improve the process so all feedback is appreciated!


@mluerig i'm so sorry that this has been a painfully slow process. I took some time to review the package, feedback and documentation given the tool doesn't work on MAC Os currently (for understandable reasons!). I wanted to be careful about setting precedent in the future. However, you have that clearly stated in your README file and also requested help / are interested in building out that functionality so we can consider that if a tool comes through with similar functionality for MAC.

The past year has been extremely challenging but starting in August / possibly July this project will have my full attention and future reviews will NOT be so slow. Thank you again for your patience. Please let me know if you have any questions / concerns! A review should not take a year so I have that commented noted already :)