[REVIEW]: Predihood: an open-source tool for describing and predicting neighbourhoods' environment

whedon commented 4 years ago

Submitting author: @fduchatea (Fabien Duchateau) Repository: https://gitlab.com/fduchate/predihood Version: v1.1 Editor: @galessiorob Reviewer: @jdalzatec, @omshinde, @nuest, @martinfleis Archive: 10.5281/zenodo.4737729

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/14eee5164e9b664fcfe62550b6924242"><img src="https://joss.theoj.org/papers/14eee5164e9b664fcfe62550b6924242/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/14eee5164e9b664fcfe62550b6924242/status.svg)](https://joss.theoj.org/papers/14eee5164e9b664fcfe62550b6924242)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@jdalzatec & @omshinde & @nuest & @martinfleis, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @galessiorob know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Review checklist for @jdalzatec

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@fduchatea) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @omshinde

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@fduchatea) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @nuest

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@fduchatea) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @martinfleis

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@fduchatea) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @jdalzatec, @omshinde, @nuest, @martinfleis it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 4 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1109/socialcom.2013.17 is OK
- 10.5220/0009885702940301 is OK

MISSING DOIs

- None

INVALID DOIs

- None

galessiorob commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 4 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

galessiorob commented 4 years ago

👋 @martinfleis, @omshinde, @nuest and @jdalzatec

Thank you all for volunteering as reviewers for this paper! At the top, you'll find individual checklists to work trough, please let me know if something is not clear or if you need any help.

nuest commented 4 years ago

Thanks! Just as a quick heads up, I'll likely do the review beginning of next week.

fduchatea commented 4 years ago

Thank you reviewing our submission, and sorry for the delay.

About mentioned missing items:

Permit individuals to create issues/file tickets against your repository: it seems that the gitlab instance of our lab requires authentification (of lab members) for creating issues, and this policy will not change. Should we switch to another platform such as gitlab.com?
Expand the description of the software: Should we expand the description here, in README or in the paper?
Expand and focus the software's research scope: Neighbourhoods are a very common concept in studies from diverse domains such as health, social sciences, or biology. For instance, Japanese researchers investigated the relationships between social factors and health by taking into account not only behavioural risks, but also housing and neighbourhood environments [1]. In a British study, authors describe how living areas have an impact on physical activities, from which they determine a walkability index at the neighbourhood level for improving future urban planning [2]. Smarts cities also consider neighbourhoods as an ideal unit division for measuring urban quality [3]. Lastly, a survey describes the luxury effect, i.e., the impact of wealthy neighbourhoods on the surrounding biodiversity [4]. However there is no clear definition of the neighbourhood environment. Our tool fills this gap by defining neighbourhoods and their environment, characteristics of these neighbourhoods and an interface for using popular machine-learning algorithms. These elements can be extended/enriched. Our tool has been currently used to measure the impact of the neighbourhood's environment when people moves in another city [5]. But it could be extended to other application domains: measuring the pollution degree in neighbourhoods, determining whether a neighbourhood is suitable as stopover for migratory birds, etc. We can reformulate and add this research scope in the paper if needed (we are already above the 1,000 words limit though).

[1] Takada,M.,Kondo,N.,Hashimoto,H.:Japanesestudyonstratification,health,income,and neighborhood: study protocol and profiles of participants. Journal of epidemiology 24(4), 334–344 (2014) [2] Frank, L.D., Sallis, J.F., Saelens, B.E., Leary, L., Cain, K., Conway, T.L., Hess, P.M.: The development of a walkability index: application to the neighborhood quality of life study. British journal of sports medicine 44(13), 924–933 (2010) [3] Garau, C., Pavan, V.M.: Evaluating urban quality: Indicators and assessment tools for smart sustainable cities. Sustainability 10(3), 575 (2018) [4] Leong, M., Dunn, R.R., Trautwein, M.D.: Biodiversity and socioeconomics in the city: a review of the luxury effect. Biology Letters 14(5), 20180082 (2018) [5] Barret,N.,Duchateau,F.,Favetta,F.,Bonneval,L.:Predictingtheenvironmentofaneighbor- hood: a use case for france. In: International Conference on Data Management Technologies and Applications (DATA). pp. 294–301. SciTePress (2020)

whedon commented 4 years ago

:wave: @jdalzatec, please update us on how your review is going.

whedon commented 4 years ago

:wave: @omshinde, please update us on how your review is going.

whedon commented 4 years ago

:wave: @nuest, please update us on how your review is going.

whedon commented 4 years ago

:wave: @martinfleis, please update us on how your review is going.

martinfleis commented 4 years ago

It will likely take some time before I'll manage to do my review. Hard to estimate now, I haven't properly looked into the complexity of the package yet.

omshinde commented 4 years ago

@omshinde, please update us on how your review is going.

It's coming along nicely. I have started reviewing it locally based on the checklist, will keep posted via updating the checklist. Thanks!

jdalzatec commented 4 years ago

@whedon I am playing a bit with the package while going through the checklist. I'll take some more time while reviewing it locally. Thanks!

whedon commented 4 years ago

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@whedon commands

galessiorob commented 4 years ago

@jdalzatec @martinfleis @omshinde thank you all for the updates, please let me know if you need my help in any way.

@fduchatea thanks for the update on your part, please see my answers below:

Permit individuals to create issues/file tickets against your repository: it seems that the gitlab instance of our lab requires authentification (of lab members) for creating issues, and this policy will not change. Should we switch to another platform such as gitlab.com?

To avoid having you migrate everything to another repo I propose two options:

That you open a "review repo" in GitLab that allows us to reference the repo we are reviewing so that the reviewers can link to the specific files they on issues.
1. That you open a "review repo" in GitHub to do the same as above, however, I think that having the review repo and the actual repo on the same platform would be easier.

Expand the description of the software: Should we expand the description here, in README or in the paper?

In the paper.

Expand and focus the software's research scope: Neighbourhoods are a very common concept in studies from diverse domains such as health, social sciences, or biology. For instance, Japanese researchers investigated the relationships between social factors and health by taking into account not only behavioural risks, but also housing and neighbourhood environments [1]. In a British study, authors describe how living areas have an impact on physical activities, from which they determine a walkability index at the neighbourhood level for improving future urban planning [2]. Smarts cities also consider neighbourhoods as an ideal unit division for measuring urban quality [3]. Lastly, a survey describes the luxury effect, i.e., the impact of wealthy neighbourhoods on the surrounding biodiversity [4]. However there is no clear definition of the neighbourhood environment. Our tool fills this gap by defining neighbourhoods and their environment, characteristics of these neighbourhoods and an interface for using popular machine-learning algorithms. These elements can be extended/enriched. Our tool has been currently used to measure the impact of the neighbourhood's environment when people moves in another city [5]. But it could be extended to other application domains: measuring the pollution degree in neighbourhoods, determining whether a neighbourhood is suitable as stopover for migratory birds, etc. We can reformulate and add this research scope in the paper if needed (we are already above the 1,000 words limit though).

I think we can just refactor the introduction and the statement of need to reflect the research need and scope by including some of the applications that you listed above. Could you take a first pass at it and I can help refine after that?

fduchatea commented 4 years ago

@galessiorob

Thanks for the hints. We have created a repository with a public tracker: https://gitlab.com/fduchate/review-repo-predihood

The first sections of the paper have been reformulated to broaden the description.

fduchatea commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

martinfleis commented 4 years ago

I know it is not my role to judge this but I believe that JOSS's requirement that the software must "Permit individuals to create issues/file tickets against your repository" is not related to review process but to the actual software repository. Therefore a review only solution is not enough. The reason why I think so is that a software in a private, although readable repository is not entirely open in a community sense. Using Matt Rocklin's seven stages of open software predihood is currently on the stage 2, while my feeling about JOSS is that it requires stage 3.

But again, it is up to editorial team of JOSS to judge this.

nuest commented 4 years ago

@martinfleis That's my understanding, too. https://gitlab.liris.cnrs.fr/fduchate/predihood/issues requires me to register/sign-in with a LIRIS account, which I, or any other external users, do not have.

@fduchatea Would you be willing to switch your "main" repository to a more open platform?

@galessiorob Please advise.

nuest commented 4 years ago

Some first review comments and questions I stopped the review at this point as some things seem to be missing (test files, clarification of authorship, public issue tracker). I invite the authors to respond and kindly ask to revisit the JOSS guidelines and the review checklist.

Note to self:

source .venv/joss-predihood/bin/activate

docker run --name mongiris -it -p 27017:27017 mongo:4
docker start mongiris

General checks

Contribution and authorship: there are three authors on the paper, but only two contributors on the repo - please clarify
I'm hesitant to check of the "Substantial scholarly effort" as defined here. Has the software been used in other use cases than your own yet? I understand it makes addressing research challenges potentially easier, but I'd like to understand better how other researchers can load their own data into predimap, and how they can get their data out of the system again.
- The example you provide (find a new place where to live) is interesting, but not a scientific application IMO. What would be a showcase that answers a research question?

Functionality

I was not able to complete all steps in the README, please @fduchatea clarify - thanks!

I had to also run pip install wheel in my virtual environment to install predihood
Waht about making the tool easier to try out by providing a Docker image? Or you could provide instructions for using MongoDB inside a container (which I prefer)
- Note to self: I ran a MongoDB 4 container, docker execed into it, installed wget, download database dump, ran mongorestore
I was not able to locate dump-iris.bin locally after installation. I did find the file dump-dbinsee.bin after manually downloading the mongiris-master.zip from https://gitlab.liris.cnrs.fr/fduchate/mongiris - Is this the file?
On the map, I can calculate classifications for some areas, but not for all (sometimes nothing happens) - is that expected? If yes, it should be communicated more clearly to the user.
Starting from the map: How can I export the calculated values? How can I compare the results of the different algorithms? Do I have to manually copy them to an external tool/spreadsheet? (Sorry if I'm missing something.
On the classifier training part
- When I select both "Remove outliers" and "Remove rural", I get errors in the console but only a generic error message in the UI - more helpful error messages for the users would be important.
- Below, the problem is that the test size is negative, but the message is generic

Documentation

Please expand in the README how to run the tests (not just mention the file name); also, the file tests.py seems to be missing anyway.
I suggest to mention the license in the README - not everyone will know where to look for it
In the paper, you mention that data is stored as GeoJSON, and you mention other use cases - can you document these other use cases and how to load data besides the IRIS data (which is a MongoDB data dump and not a GeoJSON file) ?
Community guidelines are missing

galessiorob commented 4 years ago

@martinfleis and @nuest, thank you both for your thoughtful comments and assessments on the current state of accessibility of this software. Our submission requirements state that the software must:

Be stored in a repository that can be cloned without registration.
Be stored in a repository that is browsable online without registration.
Have an issue tracker that is readable without registration.
Permit individuals to create issues/file tickets against your repository.

@fduchatea can you change the settings on the repo so they comply with the above, and we'd also need this repo to be the source one - not the one that's restricted to having a signing for LIRIS.

CC @arfon @danielskatz

galessiorob commented 3 years ago

@fduchatea checking in on the aforementioned requirements, want to make sure you'll be able to provide them so we can carry on with the review. Let me know if you have any questions!

fduchatea commented 3 years ago

@galessiorob we understand that it is an issue not to let individuals create tickets. We have migrated to a more open platform and we are discussing about mentioned requirements. https://gitlab.com/fduchate/predihood

galessiorob commented 3 years ago

Thanks so much @fduchatea

fduchatea commented 3 years ago

@galessiorob @nuest

As previously mentioned, the new repository is now on https://gitlab.com/fduchate/predihood for better accessibility.

Following are answers to most of the mentioned points.

Contribution and authorship: there are three authors on the paper, but only two contributors on the repo - please clarify

This project was implemented during a 6 months training period. Nelly B. was supervised by F. Favetta and F. Duchateau.
Although F. Favetta has not directly performed commits, he actively participated in the supervision of Nelly, definition and ideas about algorithms and implementation details, testing/installing of Predihood, writing/reviewing the paper.

I'm hesitant to check of the "Substantial scholarly effort" as defined here. Has the software been used in other use cases than your own yet? I understand it makes addressing research challenges potentially easier, but I'd like to understand better how other researchers can load their own data into predimap, and how they can get their data out of the system again. The example you provide (find a new place where to live) is interesting, but not a scientific application IMO. What would be a showcase that answers a research question?

Predihood is the result of a 6 months work (February til August 2020) for 3 persons. Besides, it reuses mongiris (a light API for interacting with the neighbourhood database) which still required a few months of work.
As discussed with galessiorob, the tool has currently been only used in our context (tool developed and paper published in Summer 2020, which also limits its visibility). 
We have added examples of potential applications in the paper: measuring the pollution degree in neighbourhoods, determining whether a neighbourhood is suitable as stopover for migratory birds, etc.
We are developing another (small) use case to explain how to load other data.
We have added an export functionality for all neighbourhoods selected on the map. In our initial context, we had to copy the results (the search was limited to several neighbourhoods) but we agree that prediction from the map should have an export functionality.

What about making the tool easier to try out by providing a Docker image? Or you could provide instructions for using MongoDB inside a container (which I prefer)

We do not have any expertise on Docker, so we will probably not be able to take into account this point in reasonable time.
Maybe some contributors may be willing to add this feature later?

I was not able to locate dump-iris.bin locally after installation. I did find the file dump-dbinsee.bin after manually downloading the mongiris-master.zip from https://gitlab.liris.cnrs.fr/fduchate/mongiris - Is this the file?
```
Yes, this is the database file. We have updated the README.
```

On the map, I can calculate classifications for some areas, but not for all (sometimes nothing happens) - is that expected? If yes, it should be communicated more clearly to the user.

We were not able to reproduce this bug. We have tested hundred of neighbourhoods, but as there are 50,000 in the database, we.
Do you have some neigbourhoods name or code (which do not produce any prediction) so that we can investigate?

Starting from the map: How can I export the calculated values? How can I compare the results of the different algorithms? Do I have to manually copy them to an external tool/spreadsheet? (Sorry if I'm missing something.

You are right, we used to manually copy results when predicting on the map. We have adding an export functionality on the cartographic interface (XSL file export).
In the training part, there is an export function of the results.

On the classifier training part. When I select both "Remove outliers" and "Remove rural", I get errors in the console but only a generic error message in the UI - more helpful error messages for the users would be important. Below, the problem is that the test size is negative, but the message is generic
```
Error when selecting both outliers and rural is fixed (issue due to Python 3.8).
Specific error messages have been added in the UI.
```
Please expand in the README how to run the tests (not just mention the file name); also, the file tests.py seems to be missing anyway.
```
We updated the path to the test file (predihood/tests.py) and we indicated how to run them.
```
I suggest to mention the license in the README - not everyone will know where to look for it
```
Added.
```

I had to also run pip install wheel in my virtual environment to install predihood

We have tested in a new (clean) virtual environment, and there was no need to install wheel.
According to documentation and Stackoverflow, it seems that wheel is packaged automatically with latest versions of pip (above 19.2).

We are working on the last points (loading data and community guidelines - although we are not sure what to write for this latter).

nuest commented 3 years ago

Thanks for the updates so far!

To me making clear how the software is useful and usable (data import/export) to others is key, because if it is limited to your own research, I personally think the corresponding scientific article is a suitable way to reference it. Just wanted to put that here as this concern might not have been clear in my previous comments.

Re. community guidelines: take a look at recent published JOSS papers and I'm sure you'll find some good examples.

I'll wait for you to complete your changes and will revisit my review then.

martinfleis commented 3 years ago

Just a ping here - I have opened an issue https://gitlab.com/fduchate/predihood/-/issues/1 since the installation following the instructions failed in my case. Therefore I am waiting for this to be resolved.

fduchatea commented 3 years ago

We are refactoring the code to facilitate import of new data. We will check the issue later.

galessiorob commented 3 years ago

Hi @fduchatea! I hope you had a nice holiday break 🎄

Checking in on several things so the reviewers can keep doing their work; some of the major issues to work on from your side:

[ ] Installation instructions issue (kindly opened by @martinfleis)
[ ] Clarifying data import and export usability
[ ] Export functionality on the cartographic interface (XSL file export)
[ ] Data and community guidelines. Some resources for this:
- Setting up your project for success with community guidelines
- Open Source Code of Conduct
- Intro blog for the GitHub Community guidelines (dated but the links lead to the current resources)
- GitHub Community guidelines, as an example
- Google OS Community Guidelines
- Code of conduct examples
- One more example from a smaller project that I think is well written

Let me know if I can help clarify any of this!

fduchatea commented 3 years ago

Hi @galessiorob Thank you, and we wish an Happy New Year to all of you.

We have added a new (fake) dataset (prediction works on it), a CSV export functionality and the Community guidelines (based on the links you provided, thanks). The README has been updated to describe how to use/import another dataset.

The updated code is still on a dev branch, we still perform some tests). We should merge the branch at the beginning of next week and will tell you.

galessiorob commented 3 years ago

Thanks, @fduchatea !

fduchatea commented 3 years ago

We have merged into the master branch. We are still correcting a few bugs, but it is possible to load and predict for another dataset. The README has been updated to reflect these changes.

galessiorob commented 3 years ago

Thank you @fduchatea

@jdalzatec, @omshinde, @nuest, @martinfleis please resume your reviews at your earliest convenience 🙏 Let me know if there's anything I can help with.

fduchatea commented 3 years ago

New bugs have been corrected this week:

fix CSS footer issue
fix error when using the example predictive algorithm MyOwnClassifier (implemented basic code to solve this issue)
fix error "object internal error" when predicting in a popup using MyOwnClassifier
fix the error 404 when displaying more-details page (for bird-migration dataset)
fix issue with the number of splits in the testing interface
fix some minor GUI issues

nuest commented 3 years ago

@fduchatea Are you currently actively developing and fixing bugs? I don't want to play catch with the developments for the review. (AFAIK, the JOSS submission should happen for rather stable pieces of software.)

fduchatea commented 3 years ago

@nuest No we are not in active development. We refactored the code 2 weeks ago to facilitate the use of new datasets (code merged around Jan 20th) and we are not developing anymore. Last week we noticed a few bugs so we corrected them, but the application is usable.

galessiorob commented 3 years ago

@jdalzatec, @omshinde, @nuest, @martinfleis I recognize that this paper has elongated past the ideal time, if you can, please resume your reviews. Let me know if you have a conflict.

Thanks!

jdalzatec commented 3 years ago

@galessiorob Hi Gabby, I will check it ASAP. Thanks!

omshinde commented 3 years ago

Hi @galessiorob ! Thanks for the reminder. I will submit my reviews as early as possible.

martinfleis commented 3 years ago

The major blocker on my side is the installation (xref https://gitlab.com/fduchate/predihood/-/issues/1). I was not able to install and run predihood so far despite several attempts. The application of this kind would hugely benefit from a docker container which would directly start the app on docker run.

omshinde commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

PDF failed to compile for issue #2805 with the following error:

Can't find any papers to compile :-(

arfon commented 3 years ago

@whedon generate pdf

arfon commented 3 years ago

@galessiorob - I updated the repository address above to point to https://gitlab.com/fduchate/predihood (I think that's correct)

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

galessiorob commented 3 years ago

@arfon thank you!

omshinde commented 3 years ago

Hi @fduchatea and authors! cc: @galessiorob

System settings while reviewing:

Installation - Local Machine
OS - Ubuntu 20.04 (WSL)
Python version - 3.8.5

Thank you for submitting a wonderful application, I foresee Predihood's utility as a great application for predicting insights about a (the) neighborhood (s). Please find my thoughts below regarding the points mentioned in the review checklist:

Installation: Does installation proceed as outlined in the documentation?

I was not able to install the software by following the installation instructions. I had to manually install the mongiris package following the installation instructions here. It would be much convenient to use docker VM for setting up Predihood as already mentioned in one of the comments above. So, the installation needs to be checked and reviewed comprehensively.

After installing it, I was not able to run the Predihood software as mentioned here by executing python3 main.py. Please follow the error log below and please correct me if I am following any step incorrectly. The error is possibly due to the missing configs directory as mentioned here:

(predihood) rajat@rajat-infinity:/mnt/d/JOSS Reviews/predihood/predihoodClone/predihood$ python3 main.py
WARNING:__main__:No parameter provided, loading default dataset configuration file (configs/config_hil.json)
Traceback (most recent call last):
File "main.py", line 343, in <module>
dataset_config = load_dataset_config(dataset_config_file)
File "/mnt/d/JOSS Reviews/predihood/predihoodClone/predihood/utility_functions.py", line 537, in load_dataset_config
with open(json_file_path) as data_file:
FileNotFoundError: [Errno 2] No such file or directory: 'configs/config_hil.json'

Though python3 main.py datasets/hil/config.json as described here worked perfectly.

Documentation:
- Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

The examples worked like charm. I am adding some screenshots below in the Other section, verifying the functionality. Also, I really liked the documentation generated using pdoc but it could be more convenient if the entire document is hosted online (maybe using GitHub pages) as currently, it is required to open the HTML files manually from the local directory.

Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

The Predihood Documentation does not mention anything about the automated tests. Though, information about tests is present in the project repository README but not in the documentation. Please verify if I missed anything.

While executing tests by python3 tests.py, I received the following error. It is possibly due to missing config and dataset files inside the dataset directory. I tried it again by copying the csv and config file from the hil directory to one level up here in my working environment but the same error sustained. The authors are recommended to verify it.


(predihood) rajat@rajat-infinity:/mnt/d/JOSS Reviews/predihood/predihoodClone/predihood$ python3 tests.py
test_add_assessment_to_file (__main__.TestCase) ... ok
test_address_to_city (__main__.TestCase) ... ok
test_address_to_code (__main__.TestCase) ... ok
test_get_classifier (__main__.TestCase) ... ok
test_get_most_frequent (__main__.TestCase) ... ok
test_indicator_full_to_short_label (__main__.TestCase) ... ERROR
test_indicator_short_to_full_label (__main__.TestCase) ... ERROR
test_intersection (__main__.TestCase) ... ok
test_set_classifier (__main__.TestCase) ... ok
test_signature (__main__.TestCase) ... ok
test_similarity (__main__.TestCase) ... ok
test_train_test_percentages (__main__.TestCase) ... ok
test_union (__main__.TestCase) ... ok
test_values_dataset (__main__.TestCase) ... ERROR

====================================================================== ERROR: test_indicator_full_to_short_label (main.TestCase)

Traceback (most recent call last): File "tests.py", line 43, in test_indicator_full_to_short_label short_label = indicator_full_to_short_label(full_label) File "/mnt/d/JOSS Reviews/predihood/predihoodClone/predihood/utility_functions.py", line 113, in indicator_full_to_short_label indicators = model.get_indicators_dict() File "/mnt/d/JOSS Reviews/predihood/predihoodClone/predihood/model.py", line 153, in get_indicators_dict list_indicators = db.find_all(db.collection_indicators) AttributeError: 'NoneType' object has no attribute 'find_all'

====================================================================== ERROR: test_indicator_short_to_full_label (main.TestCase)

Traceback (most recent call last): File "tests.py", line 49, in test_indicator_short_to_full_label full_label = indicator_short_to_full_label(short_label) File "/mnt/d/JOSS Reviews/predihood/predihoodClone/predihood/utility_functions.py", line 132, in indicator_short_to_full_label indicators = model.get_indicators_dict() File "/mnt/d/JOSS Reviews/predihood/predihoodClone/predihood/model.py", line 153, in get_indicators_dict list_indicators = db.find_all(db.collection_indicators) AttributeError: 'NoneType' object has no attribute 'find_all'

====================================================================== ERROR: test_values_dataset (main.TestCase)

Traceback (most recent call last): File "tests.py", line 145, in test_values_dataset dataset = pd.read_csv(filename) File "/mnt/d/JOSS Reviews/predihood/predihood/lib/python3.8/site-packages/pandas/io/parsers.py", line 688, in read_csv return _read(filepath_or_buffer, kwds) File "/mnt/d/JOSS Reviews/predihood/predihood/lib/python3.8/site-packages/pandas/io/parsers.py", line 454, in _read parser = TextFileReader(fp_or_buf, kwds) File "/mnt/d/JOSS Reviews/predihood/predihood/lib/python3.8/site-packages/pandas/io/parsers.py", line 948, in init self._make_engine(self.engine) File "/mnt/d/JOSS Reviews/predihood/predihood/lib/python3.8/site-packages/pandas/io/parsers.py", line 1180, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/mnt/d/JOSS Reviews/predihood/predihood/lib/python3.8/site-packages/pandas/io/parsers.py", line 2010, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] No such file or directory: '../predihood/datasets/data_density.csv'

Ran 14 tests in 7.446s

FAILED (errors=3)



3. Software paper

- The software paper contains some spelling errors and typos. The authors are recommended to carefully review the software paper for the following issues:
    - Incorrect Bibtex for reference 2
       Replace `livehoods` -> `Livehoods`
    - Typo
       Section: Methodology (citing from the paper)
`Predihood provides the following functionnalities: - adding new neighbourhoods and indicators to describe them; - predict the environment of a neighbourhood by configuring and usingpredefined algorithms; - adding new predictive algorithms.`
        - Replace `functioannalities` -> `functionalities`
        - Bulleted points are not rendered properly 

       Section: Adding new data
`It includes about 50,000 neighbourhoods with 640 indicators, and 270 neighbouhoods were` ...
        - `and 270 neighbouhoods` -> `and 270 neighbourhoods`

       Section: Predicting environment
        - `optionnaly` -> `optionally`

4. Other
Below are the screenshots:

    - Main screen
        ![1](https://user-images.githubusercontent.com/21292545/110232195-5a086200-7f42-11eb-86cf-a8cc31345ea8.png)
    -  Cartographic Interface of Predihood
        ![2](https://user-images.githubusercontent.com/21292545/110232196-5bd22580-7f42-11eb-8ebd-4eff9f035c70.png)
    -   Results obtained after executing Random Forest Classifier
        ![3](https://user-images.githubusercontent.com/21292545/110232198-5d035280-7f42-11eb-9375-d36822796da1.png)
    -   Visualization of Confusion Matrix
        ![4](https://user-images.githubusercontent.com/21292545/110232199-5ffe4300-7f42-11eb-9775-c3cc4671b298.png)

My remarks:
I would like to congratulate the authors for their efforts and for developing a neat software package. I envisage that Predihood would be very much useful and finds extended applications in various domains along with the core objective of predicting insights about neighborhoods. I would recommend the authors review the above feedback, especially the existing issues with the installation. 
Kind regards,
Rajat

openjournals / joss-reviews