Closed whedon closed 2 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @dbuscombe-usgs, @chrisleaman it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1111/0031-868x.00152 is OK
- 10.1371/journal.pone.0118571 is OK
- 10.1890/10-1510.1 is OK
- 10.1038/s41598-021-83477-6 is OK
- 10.1016/j.earscirev.2017.04.011 is OK
- 10.1016/j.ecolecon.2006.10.022 is OK
- 10.1016/j.ocecoaman.2018.01.001 is OK
- 10.1016/j.isprsjprs.2015.02.009 is OK
- 10.21105/joss.01890 is OK
MISSING DOIs
- None
INVALID DOIs
- None
Wordcount for paper.md
is 1349
Software report (experimental):
github.com/AlDanial/cloc v 1.88 T=0.95 s (41.0 files/s, 54866.5 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 6 1709 1175 4856
Markdown 17 285 0 655
Jupyter Notebook 7 0 42186 625
TeX 1 9 0 404
YAML 6 25 10 191
JavaScript 1 1 0 19
HTML 1 2 0 9
-------------------------------------------------------------------------------
SUM: 39 2031 43371 6759
-------------------------------------------------------------------------------
Statistical information for the repository 'a855e74a3b68ce3aa0fa37a1' was
gathered on 2021/08/28.
The following historical commit information, by author, was found:
Author Commits Insertions Deletions % of changes
Nicolas Pucino 187 18307 11042 94.22
npucino 17 1078 724 5.78
Below are the number of rows from each author that have survived and are still
intact in the current revision:
Author Rows Stability Age % in comments
Nicolas Pucino 7760 42.4 0.4 5.66
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@dbuscombe-usgs @chrisleaman @npucino :wave: the review takes place in this issue. Thanks!
Also, please don't forget to add a link to this review issue in any issues or pull requests you may generate in the https://github.com/npucino/sandpyper repository. This will help everyone have a single point of reference.
Thank you guys for your time! Cheers
:mega: Mid-week rally! Just checking in to see how things are going @dbuscombe-usgs and @chrisleaman ? Don't hesitate to reach out if you have any questions.
👏 Keep up the good work!
Thanks @crvernon, I've had a read of the paper and just summarizing my thoughts. Also planning to run the package on some of my own data to test it out 😄
@crvernon @npucino I am having a lot of difficulty installing this package using the provided instructions. I have a detailed issue here
I have a lot of experience with conda and my perspective is that is this a suboptimal way to install a package with all of these very complicated dependencies.
I made my own yml file installer to deal with the issues of the mixed conda/pip installation, that works a lot better and in fewer lines of code. and far fewer instructions, see here However, the __init__
script for the sandpyper pip package has broken syntax that means I get errors . So, I should now be able to run this software when @npucino gets a chance to revisit that.
Installation is typically the most difficult thing to get right with these types of python packages. I expect it will be plain sailing after that!
@dbuscombe-usgs thanks for the detailed issue report, I will fix this Monday. I tested the package installation and unit tested the code in github action and assumed it worked out in the real world. Also I noticed that for some reason your pip install sandpyper downloaded a very old version 0.0.2 (with that typo in the init file), while the latest one in pip is 1.0.0 which is drastically different. I think it is becasue the pip install GDAL confilcts made pip to roll back to a previous version where the GDAL doesn't break the imports. I ll look into that!
Great, thanks for the quick response @npucino - that sounds like a good plan. I'll check back in Monday and try to help troubleshoot. I'm pretty good at finding conda and pip issues in other people's code, so don't take it personally (I also really hate the python ecosystem for its crummy package managers!)
in the yml file I made, it does successfully create all the dependencies in conda using conda-forge. then it is looking for a version of sandpyper on pip that is compatible, That is version 0.0.2. That would explain why, I think
but it does prove that a) you can use conda entirely to set up the dependencies in a conda environment without being overly prescriptive about specific versions of packages, and that b) its all the pip-only dependencies that are breaking the install. Its not clear why, for example, why the requirements.txt in the pypi package lists the same libraries as you suggest installing into the conda environment ....
my general recommendation here is that conda environments as a rule want to install conda packages, not pip ones. It works, but only if you allow conda to take precedence for as many of the dependencies as you can, and dont cross-post dependencies between conda and pip, you'll likely have fewer install dependencies
I can also try installing on linux and report back what I find on the issues tab on the repo page
No no I do not take it personally @dbuscombe-usgs, please be as critic as you can I need it to grow and get better at this and improve that package! No mercy is good science. I think I will have to change the installation procedure and move it all to conda rather than pip. CHeers
Thanks @dbuscombe-usgs and @npucino it looks like you are working towards a solution. Feel free to reach out if you have any questions.
:wave: @chrisleaman, please update us on how your review is going (this is an automated reminder).
:wave: @dbuscombe-usgs, please update us on how your review is going (this is an automated reminder).
👋 @chrisleaman, please update us on how your review is going (this is an automated reminder).
Sorted out package installation issues, working through example notebooks at the moment.
I will try again to install and run the tests / notebooks, so I can complete the 'functionality' section of the checklist, otherwise I have read through the paper and documentation and they are both good as-is
Thx @chrisleaman and @dbuscombe-usgs for the reviews. I am currently working on:
So, maybe @dbuscombe-usgs you can wait until I fix those issues before running trough the code. Just a suggestion.
CHeers!
Yes, I can certainly wait. Thanks!
@crvernon - Just an update, I'm happy with code and documentation as @npucino has addressed the issues I've raised. I have a couple of minor suggestions for the manuscript which I will finalise and create an issue for by the end of the week.
@whedon set v1.2.0 as version
I'm sorry @npucino, I'm afraid I can't do that. That's something only editors are allowed to do.
:mega: Mid-week rally! Just checking in to see how things are going.
:open_hands: Thanks @chrisleaman for the notification! Glad to see things are progressing well.
@dbuscombe-usgs and @npucino it looks like you both are still working out some bugs so keep up the good work!
Have a great and productive rest of the week!
Hi @crvernon , I packaged the code in conda and I already pull request its recipe to the team: https://github.com/conda-forge/staged-recipes/pull/16228
I am just waiting to be added to the conda server and then test with @dbuscombe-usgs if everything works well with this strategy! I do not know how long this usually take, hopefully not too much.
Cheers! Nick
I'm on vacation for a week then I can get to finalizing my review. Apologies, I left the computer at home ...
I am having an hard time relaxing the requirements for Sandpyper right now.
Basically, as per conda request and @dbuscombe-usgs request I was trying to use the most recent versions of Sandpyper dependencies rather than providing strict versions of those. Apart from minor changes to the code (CRS definition and some minor Pysal import changes), the code should run smoothly with no issues. Note that the tests and Jupyter Notebooks that I created are based on the data I generated with Sandpyper within the environment with strict dependencies requirements.
When I create a fresh new environment with no dependencies versions specified except for Python = 3.8, I have (for now) one issue that I cannot understand where it comes from.
In the unsupervised labelling procedure, within my get_sil_location, I use Scikit-Learn KMeans and Silhouette score functions to cluster my points and to compute silhouette scores. I set an integer as random state so that I obtain always the same results.
The expected results must be obtained in order to run the tests and to follow along the notebooks I provided to see how Sandpyper works.
The code works, but with the relaxed environment the labels KMeans assign are slightly different, which affects the silhouette scores, the suboptimal number of clusters and eventually the labels the points will be assigned to.
In order to run tests and provide users with actionable Jupyter Notebooks, these little changes would require me to go through the dataset again, create new polygons of corrections and re-identify the classes of all the test surveys, which takes quite a bit of time and I ideally would like no to go through that route at this stage if possible.
So I looked at the changelog of sklearn (changes from 0,24 to 1.0 versions) and some minor changes occured in the KMeans algorithm, but those seems to be related to performances rather than random state.
I also tested environments completely relaxed except for sklearn==0.24.1
which is the one I used for the test data, but the inconsistencies are still there.
My main suspect was the random state but it looks like that is not the case.
I don't understand what is causing to change the output from the get_sil_location
function, despite it is working as expected.
If I use the strict environment, everything works as expected.
As @chrisleaman already proved, Sandpyper is working as expected and passing tests as is (hosted in PyPI with strict dependencies requirements). Is hosting Sandpyper in Anaconda a must do for this package to be published in JOSS? It is of course very important to relax the requirements and upload the package to conda and I really aim to do it in the near future. But if this is not super strictly necessary in the context of publishing in JOSS, I would rather go through the process in a later stage.
Cheers! Nick
Re https://github.com/openjournals/joss-reviews/issues/3666#issuecomment-927454043, I'm fine for this JOSS review if sandpyper
is not on conda.
We should still confirm that @dbuscombe-usgs is able to get it installed and running in his environment or if there are still other issues.
Re #3666 (comment), [...] We should still confirm that @dbuscombe-usgs is able to get it installed and running in his environment or if there are still other issues.
Absolutely. Thanks.
I am back from vacation and will be ready to tackle this tomorrow. Apologies for the delay
@npucino - to address your earlier question of:
Is hosting Sandpyper in Anaconda a must do for this package to be published in JOSS?
The answer from my perspective is: no. Having this package able to be installed and function given the documentation and claims that you provide is sufficient. I agree that this would be a nice addition in the future to extend the usability of your package; though it may not be in the current scope of your development.
I agree with @crvernon and @chrisleaman that a conda solution shouldnt be a requirement for JOSS, however I will say that scikit-learn is the major ML library in python, and using a version that is several versions behind the main trunk (we are at 1.0 now) is storing up problems for the future, IMHO, if this is to be maintained for several years. I anticipate you'll be fielding a lot of questions about installation going forward, if the toolbox proves popular (I hope so). However, over time hopefully a solution is provided. I took a quick look at the changelogs and API references for the KMeans functions over time and, as @npucino noted, there doesn't seem to be anything related to random seeds or state, so it is puzzling!
To run the notebooks., the instructions are inadequate for a novice user, who needs to be told they need to download the github package (like I said before, why go to the bother of making a pip package when the user has to download the code anyway?). A few sentences here in the README explaining to the potential user would go a long way
In the first notebook, "1 - Introduction and data preparation", you don't explain where 'demo_data.rar' is located. Its not a huge deal, because I discovered you actually don't need it, but I did waste about 10 minutes looking for it! I looked at the paper dataset, logging into the propeller site as-per the 'data availability' statement, but I dont quickly see how to download that data file. In fact, it is not at all clear how to download any data from the propeller site...
I am able to run all the provided examples. The toolbox is fast, and very useful. The methods work as-described, and the statements made about the software in the paper and supporting info are accurate. Good job!
I'll complete the checklist and APPROVE this for JOSS. I trust the level of documentation for installation and implementation will improve with a growing user base
Hi @dbuscombe-usgs , thanks for your feedback! Let me address your valuable points one at the time.
One of the major request from yourself and conda-forge reviewers is to relax the requirements for Sandpyper. If I relax the requirements basically telling conda to install the latest versions of packages which are compatible with the user active environment, I get that weird behaviour that I explained in my last comment https://github.com/openjournals/joss-reviews/issues/3666#issuecomment-927454043. This doesn't mean the code doesn't work, it just means that because small differences in the labels assigned by KMeans, I have to re-build the test data from scratch (correction polygons and class dictionaries). This requires some spare time from my side which I am more than willing to spend in my evenings every now and then rather than full time right now. Because it is my priority to make this package better by relaxing requirements and using conda instead of pip.
I think I see what you mean, when you install the package with pip in the conda environment Sandpyper automatically installs (again) the packages and uses pip. Is this the issue you are referring to?
Because if the conda environment sandpyper_env
is created by copying and pasting the requirements in the README
, when you install sandpyper 1.2.0
(latest) using pip, you should have already all the right packages installed and this should only be a kind of check.
If for some reason your environment has some required packages that differ from the versions specified in both the conda create environment ...
(README
) and in the requirements.txt,
then it uses pip to install those. And that is causing issues.
In my case though this doesn't happen. See the image below.
After I created the sandpyper_env
(in this case renamed sandpyper_env2), I install sandpyper using pip and this is what I mean by "check".
But if that is not what happens to you, there is something weird going on. I am looking into it right now.
You are right, I hadn't thought about this! Actually this is a really good point and I would love to hear your suggestion on how to solve this. Even once the package will be in conda, to run the notebooks users will need to download the repo anyway to get the test_data.rar (which by the way I forgot to include it in its compressed format in the repo). I have 2 ideas:
option 1 is easy and I can do it quickly. Option 2 is much better but I must first learn how to use binder and then do it, which might take a bit. But looking at this seems easy.
Anyway, I now update the README
and the notebooks to explaining how to get the test_data.rar.
Sorry, definitely something I will explain at the beginning of the notebook. As per the propeller platform, the data is freely accessible but not downloadable at the moment. You can access it, visualize it, make measurements and see changes in all 20 sites in the past 3 years but as of today, it is not downloadable. I think it will eventually be but I have to check with my supervisor. I see the confusion here, in the paper there is written freely accessible which can be misleading.
More than 220 3D datasets are already freely accessible to anyone via a user friendly web-platform to share and communicate information, promote coastal awareness, build knowledge and further increase the impact of our efforts.
I update the README and the notebook and find a way to make the test_data.rar
at least downloadable.
Thanks again!
Nick
Awesome, thanks @npucino for that quick and thorough response. You've totally cleared up any remaining confusion I had over the KMeans issue.
Thanks for updating the README and notebook
On the mixture of pip and conda, I do get the same output as you, and I do now understand the need for the specific versions and the pip package. But ...the way you mix conda and pip is not optimal: a conda yml file would allow you to combine all requirements for conda and pip into a one-line installation (conda env create -f sandpyper.yml
). You would list all the conda and pip (for now, just sandpyper) dependencies together and it would replace the two separate commands you have currently 1) conda create --name sandpyper_env python=3.8 geopandas=0.8.2 matplotlib=3.3.4 numpy=1.20.1 pandas=1.2.2 tqdm=4.56.2 pysal=2.1 rasterio=1.2.0 richdem=0.3.4 scikit-image=0.18.1 scikit-learn=0.24.1 scipy=1.6.0 seaborn=0.11.1 tqdm=4.56.2 pooch=1.4.0 fuzzywuzzy
and 2) pip install sandpyper
. It would be a much cleaner installation, because conda would not need to search and resolve everything twice, making it faster, and also probably requiring much less disk space (for conda caching). Most importantly, in my experience it is less likely to break this way.
I provided an example you could modify by hard coding in the version numbers. I'm happy to help you test it
@dbuscombe-usgs and @npucino - I noticed that all tasks have been checked. Are you still collaborating on fixes for this review or have all of your inquiries been addressed @dbuscombe-usgs ?
Also, I noticed that the following related issues are still open:
Hi @cvernon, I am satisfied - I approve this submission
I will still work with @npucino to help test remaining issues with conda packaging
Thanks @dbuscombe-usgs I really appreciate your thorough review and willingness to volunteer so much of your time to make this the best product it can be!
Hi guys, thanks again for you extremely valuable comments and contributions!
Yesterday I was trying the .yml option with different degrees of version requirements and still getting some tests to fail. Right now I am also testing different versions of requirements to see if I can relax at least part of the requirements and isolate the issue.
But, using the strict version requirements, tests do not fail so for now, I will stick with this sub-optimal solution.
By the way @dbuscombe-usgs, I addedtest_data.rar
in the repo and updated README
and notebooks with direct link for downloading the data. I am now trying the last few versions of yml and the I update the README
so that users will use the conda env create -f sandpyper_env.yml
option.
I never learnt so much in a review process. Thanks a lot @dbuscombe-usgs @chrisleaman @crvernon . Nick
Actually, I am close to nail down this weird random generator issue. Apparently something happened between numpy 1.20 and 1.21 (release note) with their default random number generator. I really start to think that is numpy that causes my tests to fail behind the curtains, affecting sklearn KMeans function, because I noticed that if I use numpy 1.20 it works, if I use 1.21 tests do not pass. Therefore, the only requirement I might need is numpy 1.20 but I still have to try it. And, numpy is so fundamental that it is not a long-term solution at all, but, I will get there!
Cheers!
@npucino no problem at all. Let me know when you have the package in the condition that you want it published in and I will proceed with my tasks.
@npucino just curious to see if you are ready to move forward with this or if you are still making some final modifications?
Hi @crvernon , sorry my late reply! I am now fixing one last bit and then I am happy for you to proceed with the reviewing process. I am working at it right now, I might be done today, I let you know here very soon, thanks!
@crvernon All good! Ready to move forward. THanks!
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@whedon check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1111/0031-868x.00152 is OK
- 10.1371/journal.pone.0118571 is OK
- 10.1890/10-1510.1 is OK
- 10.1038/s41598-021-83477-6 is OK
- 10.1016/j.earscirev.2017.04.011 is OK
- 10.1016/j.ecolecon.2006.10.022 is OK
- 10.1016/j.ocecoaman.2018.01.001 is OK
- 10.1016/j.isprsjprs.2015.02.009 is OK
- 10.21105/joss.01890 is OK
- 10.1038/sdata.2016.24 is OK
- 10.3390/data4020073 is OK
- 10.3390/ijgi8060267 is OK
- 10.1016/j.geomorph.2016.02.025 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@npucino - very nice paper! The following is a list of a few items to address in your paper specifically before we move forward:
![My figure caption \label{fig:fig1}](my_figure.png)
We find in \autoref{fig:fig1} that show something...
[x] LINE 23: Formatting for "...(as Digital Surface Models, DSMs) (Gonçalves & Henriques, 2015)..." should be "...(as Digital Surface Models, DSMs; Gonçalves & Henriques, 2015)..." The way to do this in your paper.md
file is to use "[as Digital Surface Models, DSMs; @Goncalves2015-qr]"
[x] LINE 44: Remove extra parenthesis around "..., see (Lane et al., 2000)),..." which should be "..., see Lane et al. 2000),..."
[x] LINE 48: First of GIS acronym without description. "...within a GIS could..." should be "...within a Geographic Information System (GIS) could..."
[x] LINE 71: Per LINE 48, "...plug-in for ESRI ArcMap geographical information system..." should read "...plug-in for the ESRI ArcMap GIS."
[x] LINE 82: I would change "Currently v1.2.0 allows to:" to "Sandpyper currently provides the the ability to:" because your version will change upon release for this article
[x] LINE 92: Formatting for "...(Figure 2c)(Pucino et al. 2021)." should be "...(Figure 2c; Pucino et al., 2021)"
❗ Let me know when these items are taken care of and we will move on. Thanks and keep up the good work!
Submitting author: @npucino (Nicolas Pucino) Repository: https://github.com/npucino/sandpyper Version: v1.3.3 Editor: @crvernon Reviewer: @dbuscombe-usgs, @chrisleaman Archive: 10.5281/zenodo.5565487
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@dbuscombe-usgs & @chrisleaman, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @crvernon know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Review checklist for @dbuscombe-usgs
✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @chrisleaman
✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper