Closed stijnvanhoey closed 3 years ago
Hi @stijnvanhoey !!! 👋 thank you for this submission!! i will add this to our list of packages to discuss in our next meeting which is next week - feb 6!!
@nickledave will be a reviewer for this submission!! @lwasser will try to find a second reviewer and editor (or she will be editor)
I would be happy to help here as a second reviewer :)
hey @nickledave and @xmnlab i know there is a lot going on. i just wanted to check in on this review to see how you guys were doing. I am very behind and know that ropensci is stopping new submissions for atleast a month. are you guys still able to review this package or should we take a step back?
Many thanks.
I can work on that. I will start that this week.
Same here, I think I can finish by next weekend at the latest.
Thank you so much @xmnlab and @NickleDave !! much appreciated.
ok let's shoot for both reviews in by Wed April 15th just in case you guys need a bit more time!! and reach out please if you need anything.
Hi @Roel @stijnvanhoey @pjhaest thank you for your patience. Please find below my review:
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 3
At a high-level pydov
is a fully-developed packages that meets the PyOpenSci review criteria. It provides impressive functionality, yet is tightly scoped to querying and downloading data from Databank Ondergrond Vlaanderen (DOV).
I have one request for the documentation, and one question about the code, although I don't think the question about the code needs to be addressed before approving the package.
The documentation is very thorough, but I think one thing that would help new users is an "intro to working with pydov" tutorial. This tutorial would explains things at a level between the quickstart and the very detailed tutorials for each individual search object. What this tutorial would provide, that the current tutorials do not have, is an introduction to the core abstractions provided by the package. I notice that each DOV search tutorial follows the same structure, but there is not a lot of language about why a potential user might want to use some of the functionality, e.g. why get all the groundwater screens / boring holes within a bounding box.
The intro tutorial would ideally be a specific example of an analysis--it could even be one of the existing search tutorials--that introduces the concepts that repeat throughout the individual tutorials for each search object. This intro should be targeted at a "naive" researcher who is interested in having access to the data, but might not be familiar with some of the existing data structures and libraries that pydov
leverages, e.g. the WFS protocol. I admit to a total lack of domain expertise in this area, so the devs' idea of what a "naive" researcher would know compared to mine might vary, but I think that adding an intro with more language about about what and why the package does what it does, in addition to how to do those things, would make it more welcoming to new users.
My question about the code relates to the search
sub-package. I understand the need for the types
sub-package, because each class in that sub-package encapsulates the metadata such as wfs and xml fields specific to each dataset type. However it's not quite clear to me why different classes are needed for the search objects that correspond to each type in the search
package. Am I right that all the search
classes have the exact same attributes and methods? My read of the code is that the only difference between the search classes is the default value for objecttype
. E.g., a BoringSearch
class has its objecttype
default =Boring
, which dictates what fields it has, but aside from that, it has the exact same methods as all the other search types. If I'm misunderstanding something about the code, please let me know and I'll just remove this question. But it seems like it might be possible to reduce code repetition by having another class that inherits from AbstractSearch
with all the code that is common to the specific Search
classes, and then just subclass this additional class for specific types.
Something like:
class MetaSearch(AbstractSearch):
def __init__(self, layer, objecttype):
super(MetaSearch, self).__init__('dov-pub:Boringen', objecttype)
def __common_search_methods_here__():
...
import Boring
BoringSearch(MetaSearch)
def __init__(self):
super(BoringSearch, self).__init__(layer='dov-pub:Boringen', objecttype=Boring)
Thank you again for your patience during this crazy time and thank you for submitting the package, it was a pleasure to review. Looking forward to your response
hey everyone, I couldn't finish my review today. I will finish that tomorrow. sorry for the delay.
@Roel, @stijnvanhoey, @pjhaest, thanks for your patience. pydov looks awesome, I am adding my review notes below:
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 4 hours
Congratulation for the work you all are doing on pydov. I am adding general notes below:
Requests
Suggestions and general comments (not required)
assert type(df) is DataFrame
to assert isinstance(df, DataFrame)
def get_type(self): return Boring
-> klass = Boring
and it keeps your test classes more compact.Hi @NickleDave @xmnlab
Thank you for taking the time to review our package so thouroughly during these strange times.
All the points you raised are valid and I have made issues for them in our Github project so we can track progress there as we work ourselves through them. We plan to have a two-day codesprint somewhere in May, and I'm confident we can tackle most of the issues then.
@NickleDave: I agree that an introductory tutorial that introduces the search concepts more in detail (and by example) would help new users. As you mention the concepts are the same for each of the search objects, so if a new user can understand these concepts from the introductory tutorial they can quickly move on to the subject of their interest. We will definitely work on this in the codesprint.
Regarding your question about the search classes, you are right that each of them duplicates quite a lot of code. The idea is to store the (results of the remote calls to the) WFS schema, metadata, XSD schemata and so on in class variables of the search classes so that they are shared between search class instances. This avoids dowloading this data multiple times in the same session. There might be a better way to achieve this though. Another option we can consider is letting go this behaviour of keeping this data in class variables in the first place. Since the search instances don't persist data between searches there should be no need to create more than one. I made an issue for this in our Github, we can discuss this further either there or here, as you please.
@xmnlab: We will add installation instructions for a development installation in the README as well as in the installation section in the documentation. There are bits and pieces of information (f.ex. in the contributing section), but that is quite hard to find and assemble :)
We will also add the output dataframe of the search, both in the README as well as in the Quick start in the documentation. I agree that (potential) users can so quickly see when they can get as output.
I have made issues for your other suggestions. Thanks very much for tips, we will definitely look into them!
Thanks a lot to both of you for your time and reviews! We will work through your suggestions as time allows during the next couple of weeks and will follow up here once we made some progtess :)
Stay safe!
hi @Roel @xmnlab @NickleDave thank you all for your work on this review! @Roel if and when you have time, can you kindly reference this issue in your issues / pr's as you work through them? this will make it easier to circle back on the review in may or june (or whatever timing works for you) to ensure things have been resolved!! Many thanks all.
👋 all!! just checking in here... it looks like there are a bunch of issues and such being worked through still so that is totally cool. if anyone has questions please say the word... otherwise i'll just wait to see things closed up and then will check in again to see whether the reviewers are satisfied!! i'll be offline next week so i won't be responsive until june 1! Stay healthy everyone :)
Hi @lwasser ! Our code sprint is planned for next week (May 28-29), so I hope to make some progress in the issues then. I assume we should be able to pick up the review somewhere in June :)
Thanks for your patience, enjoy your week offline and stay healthy!
@Roel sounds great! just ping us here when you are ready to pick back up. i know this all takes a lot of time so take the time that you need!
Hi all! I hope everyone's doing well out there. We managed to tackle most of the raised points during our last code sprint and in the weeks that followed. There are still two issues unresolved for now, but we'll take these on in one of our next sprints for sure.
Let me know whether it suits you to pick back up here. Thanks for you patience and take care!
hi all. just checkin in to see whether all review comments have been addressed or who is working on what items at this point. many thanks.
oh sorry, I think I lost this notification, really sorry. I will take a look at the last comment @Roel sent. it sounds pretty nice that a lot of issues were addressed at the sprint! that is great :)
@Roel thank you so much for your patience! thanks for addressing the points I suggested.
@lwasser, The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
thank you @xmnlab !!! @NickleDave you were the second reviewer on this package. have all changes been implemented per your satisfaction? it sounds like there are two outstanding @Roel can you tell us what is left to do following this review from your perspective? and perhaps at this point some have been addressed in a sprint?
Thank you for pinging me @lwasser -- sorry for losing track of this
Yes I see that https://github.com/DOV-Vlaanderen/pydov/issues/254 adds an introductory tutorial -- looks good @Roel, I think this can definitely help a newcomer quickly get up to speed on usage of pydov
.
The other concern I raised about code duplication has been noted in https://github.com/DOV-Vlaanderen/pydov/issues/264 but I don't think this should be a blocking issue.
So, yes, the authors have responded and made changes to my satisfaction.
I recommend approving pydov
(🎉)
Awesome, this is great news! No worries about the timing, we're all busy :)
We will definetely pick up the remaining issues in one of our next sprints and keep them in mind when refactoring or working on new features.
Thank you again for your time @lwasser, @xmnlab and @NickleDave! Great to become a part of the growing list of pyOpenSci packages!
Thank you for submitting pydov
for review @Roel @pjhaest @stijnvanhoey !!
I think it's a great tool, excited to tell more people about it.
Perfect for pyOpenSci. And I learned a lot from seeing you all develop it further.
🎆 excellent! ok i think then we can consider pydov accepted as a part of pyopensci!! thank you @Roel @pjhaest @stijnvanhoey and @NickleDave @xmnlab for the efficient review process here!!
there are a few last steps if someone is willing to take the time to do this
If you are not already on our website as a contributor, please submit a PR to pyopensci.github.io to add yourself as a contributor using:
contributor_type:
- reviewer
@stijnvanhoey for you the tasks are:
pyopensci.github.io
to add pydov to the pyOpenSci package list and to add yourself as a contributor using:
contributor_type:
- package-maintainer
and
packages-submitted: ["pydov"]
OPTIONAL!!
If you are interested (not required), you can write a blog post for the pyOpenSci website about PyDov (see blog post about pandera) to promote your package. This is totally up to you!
Is the version that we are accepting here 2.0.0 ? Thank you all!!
This is indeed great news, thanks reviewers for all the comment and all credit to @Roel for being the main driver to get these changes and improvement all tackled. Thanks a lot!
With respect to the badge, see https://github.com/DOV-Vlaanderen/pydov/pull/292
Hi there @stijnvanhoey ! i'm doing some housekeeping today and i noticed pydov is not on our list of packages here. Nor are you listed as a contributor! did a PR get missed? i don't see one but that does not mean i didn't miss something. we can close this once this step is complete! i hope you are well!
Hi @lwasser, thanks for the update! I don't know what we missed. But we are having a next code sprint in some weeks. If it's not picked up before that time, we'll definitely do it at that occasion. Thx all!
Closing this--thank you again all for a great review process!
hi there @stijnvanhoey !! i wanted to check in on pydov here! pyOpenSci has funding to operate for a few years now and i'm focused on the project.
As a first step, i'm reaching out to all maintainers and reviewers who have worked with us in a review to request that they complete a survey:
🔗 Link here this should take you about 10 to 15 minutes and i'd greatly appreciate your time in doing it.
Also are you the only maintainer of this package? If there are other maintainers can you please list them here and kindly ask them to take the survey as well?
We also have a slack community that you are welcome to join if you'd like. i can email you with a link.
Many thanks in advance for doing this! And i hope all is well!
actually i think there are three maintainers possible here? @Roel , @pjhaest I just pinged @stijnvanhoey above but would appreciate your filling our our🔗 SURVEY as well. and you are also welcome to join us on slack. Many thanks again for your time.
Hey there - @Roel, @stijnvanhoey, @pjhaest just checking in again here to make sure you recieved my message above! It's important that packages in our pyOpenSci ecosystem are maintained and that we have some level of communication with our maintainers. I think pydov is still being maintained.
But i'd like to know who the current maintainers are so please do get back to me here on this issue. And we'd GREATLY appreciate your time in filling out this survey as well.
Many thanks for your time in doing these things! 🙌
@lwasser thanks for reaching out. @Roel is indeed the current maintainer and doing all of the updates and new releases. Would be best to have @Roel filling in the survey.
Hi @lwasser
I'm indeed doing most of the maintenance of pydov currently. We had a bugfix release just a few days ago and a new major release is coming up soon!
I hope to find the time next week to fill in the survey.
ok awesome. @Roel i'll list you as the lead maintainer then for this package. Many thanks @stijnvanhoey @Roel for getting back to me!! and thanks in advance for working on the survey @Roel . I appreciate your time!
Hi @lwasser I filled in the survey, I hope the results will be useful for you!
@Roel thank you SO SO MUCH for that!! i appreciate it.
Submitting Author: @roel, @stijnvanhoey, @pjhaest Package Name: pydov One-Line Description of Package: Python package to retrieve data from Databank Ondergrond Vlaanderen (DOV) Repository Link: https://github.com/DOV-Vlaanderen/pydov Version submitted: 1.0.0 Editor: @lwasser
Reviewer 1: @nickledave
Reviewer 2: @xmnlab
Archive: TBD
Version accepted: TBD
Description
pydov is a Python package to query and download data from Databank Ondergrond Vlaanderen (DOV). DOV aggregates data about soil, subsoil and groundwater of Flanders (Belgium) and makes them publicly available. Interactive and human-readable extraction and querying of the data is provided by a web application, whereas the focus of this package is to support machine-based extraction and conversion of the data.
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.
pydov enables the access to the data services (both WFS services as well as the data itself) provided by DOV. It supports users in searching and downloading data on soil, subsoil and groundwater in Flanders in a reproducible way.
The machine-based availability of the data can potentially serve a diverse community of researchers and students. Applications are in the field of geology, hydrogeology and geotechnics.
We have no knowledge about other packages that provide access to DOV data.
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements): "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here