passaH2O / dorado

For shallow-water Lagrangian particle routing.
https://passah2o.github.io/dorado
MIT License
57 stars 10 forks source link

JOSS Review: Code structure and class hierarchy #13

Closed elbeejay closed 4 years ago

elbeejay commented 4 years ago

Comment from @gassmoeller here:

Your current code works for your purpose and the examples you show, but I am slightly concerned about the future maintainability of the code, because you have mixed responsibilities and an unclear class hierarchy for some of your functions. I will give some examples for your main functions and why that could become a problem later below. I would not consider this a roadblock for the publication in JOSS, but I would like to see as much as reasonable of this fixed at some point and I would like to hear your opinion on the matter:

You define particle locations either based on an initial seed location, or based on existing particle locations either from a previous run, or created by the user. But the initial seed location is expected to be handed over during construction of Particle, while existing locations are handed over to run_iteration. This is a bit confusing for a newcomer, but more importantly: It shows that you are not sure who (which function) is responsible for the particle generation. I would suggest to have one function specific for particle generation, e.g. generate_particles(x_seed, y_seed, start_xindices, start_yindices). In a previous particle system I have written we separated that into a set of different functions (Particles::Generators::regular_reference_locations, Particles::Generators::probabilistic_locations, ..., see here if you are interested in details), but that would be overkill for your application here. Still you want to keep that option instead of spreading the responsibility for generating over multiple functions or making your run_iteration function into a somehow_generate_particles_and_run_iteration function. This would become a problem once you start to add more particle generation options (e.g. probabilistic particle distribution with varying density functions, particle patterns like discs/circles/lines).

A related comment about assigning responsibilities: I do not understand why the run_iteration function is a member of Particle, but the single_iteration function is a member of Tools. single_iteration does the core work of moving the particles, and so I would suspect it to be a member of Particle. It is true that single_iteration uses many of the Tools functions, but that does not mean it is conceptually a tool itself, and I would move it to Particle. On the other hand the functions coord2ind, ind2coord, and unstruc2grid are typical tools functions and I would expect them in the Tools class, and not as stand-alone functions in particle_track.py. More about Tools below.

Currently Tools defines a number of utility functions for the Particle class, and then Particle is derived from Tools. But this is not a typical is-a parent-child relation. Particle is not a Tools. Particle uses Tools, so instead it should import Tools, or be given an object of type Tools upon creation or similar. This is already a problem in your code, because functions in Tools use member variables that are only read in and initialized in Particle (e.g. self.velocity in Tools.calc_travel_times). In other words Tools can not be used without Particle or used as a base class for anything else. Utility functions or objects should usually not own any member variables (and in particular not rely on member variables of derived classes), and instead be handed all necessary parameters to compute the little computation they are supposed to do.

Related to the last comment: Your documentation in Tools currently states that Particle is derived from Tools and uses its functions, but if that would change in the future you would likely only update the Particle class and forget to change the documentation of Tools (not implying you are not careful, this is just what experience in programming shows). Instead this documentation should be in the Particle class, explaining why Particle is derived from Tools, this way if the relationship changes you also update the documentation. Documentation should always state what this thing does, not what others do with it.

elbeejay commented 4 years ago

Many thanks again to @gassmoeller for this super helpful and insightful set of comments and suggestions. This too generated a healthy discussion about how we could best incorporate these ideas to make the code more maintainable in the future.

We wanted to share our plan to help clear up these issues and appreciate any further comments or suggestions you may have.

You define particle locations either based on an initial seed location, or based on existing particle locations either from a previous run, or created by the user. But the initial seed location is expected to be handed over during construction of Particle, while existing locations are handed over to run_iteration. This is a bit confusing for a newcomer, but more importantly: It shows that you are not sure who (which function) is responsible for the particle generation. I would suggest to have one function specific for particle generation ...

We agree with this comment and appreciate the insight. Our plan is to define a generate_particles() function that defines the initial locations for the particles within the all_walk_data dictionary. The proposed workflow would be something like:

# initialize the particles class
particles = Particles(params)

# generate particle start locations and define the number of particles to be routed
particles.generate_particles(x_locations, y_locations, num_particles)

# do the particle routing
particles.run_iteration()

In this way the params class will be clearly defined as the class to handle "global" model parameters, the "generate_particles" functions will define the initial location and number of particles to simulate, and the "run_iteration" method will move the particles through the model domain.

A related comment about assigning responsibilities: I do not understand why the run_iteration function is a member of Particle, but the single_iteration function is a member of Tools. single_iteration does the core work of moving the particles, and so I would suspect it to be a member of Particle.

In our discussion about this comment we realized that our naming scheme for the scripts and functions is not well thought out. We plan to clear this up by revising the name of the file particle_tools.py to LagrangianWalker.py as we intend for that file to hold the functions related to the Lagrangian random-walk process that moves the particles. We also are going to rename single_iteration to particle_stepper to make it more clear that the job of this function is to step the particles in space. Upon reflect we considered it confusing to have "run_iteration" and "single_iteration" as functions in the codebase as they sound very similar but do different things. "run_iteration" will be kept as this is used to move a group of particles in the model domain, whereas the "single_iteration"/"particle_stepper" is part of the random-walk framework and is how individual particles step from one cell to another.

On the other hand the functions coord2ind, ind2coord, and unstruc2grid are typical tools functions and I would expect them in the Tools class

With the renaming of particle_tools.py to LagrangianWalker.py we hope that our decision to leave these functions in particle_track.py makes more sense. Our intention is for the particle_track.py to hold functions associated with the model domain (such as the model parameters and these coordinate transforms), while the LagrangianWalker.py file will hold the core functions associated with the random walk.

Currently Tools defines a number of utility functions for the Particle class, and then Particle is derived from Tools. But this is not a typical is-a parent-child relation.

This comment was particularly insightful and really helped us think about the structure of the code and how it can be best organized to be as clear as possible. With this in mind, we plan to remove the Tools class, and instead re-define the methods there as functions. This removes the inheritance structure that was not really proper inheritance in the standard object oriented programming sense. This change will prompt changes to the documentation that should help address the issues raised about the clarity of the documentation and relationship between "Particle" and "Tools".

So thanks again @gassmoeller, we hope that this plan of action sounds reasonable and will address the concerns you brought up in your review. It was really helpful to us and these changes will definitely enhance the ability of the code to be extended in the future. Please alert us if any of these proposed changes are unreasonable or insufficient.

gassmoeller commented 4 years ago

Yes, all of your suggestions make sense to me. Thanks for addressing my concerns. It sounds like you are already starting the rework, do you want to pause the JOSS publication until the main part of the restructuring is done? That would have the benefit that users will not run into incompatible changes once you release the new version.

With the renaming of particle_tools.py to LagrangianWalker.py we hope that our decision to leave these functions in particle_track.py makes more sense. Our intention is for the particle_track.py to hold functions associated with the model domain (such as the model parameters and these coordinate transforms), while the LagrangianWalker.py file will hold the core functions associated with the random walk.

Yes this makes sense. At some point you probably want to have a specific file exclusively for the Particles class (because that is likely to grow), but it makes sense now that coord2ind is not in lagrangian_walker.py.

elbeejay commented 4 years ago

Yes, all of your suggestions make sense to me. Thanks for addressing my concerns. It sounds like you are already starting the rework, do you want to pause the JOSS publication until the main part of the restructuring is done? That would have the benefit that users will not run into incompatible changes once you release the new version.

Yes, we plan to restructure the code prior to the JOSS publication - am not sure if this requires pausing the submission or not as we believe we can get this done relatively fast (within 1-2 weeks).

gassmoeller commented 4 years ago

Sounds good. I do not think there is an official "pause" mechanism, just let us know once you are happy with the state of the code and I can take another look and then comment on the review PR.

elbeejay commented 4 years ago

That sounds great, thank you!

elbeejay commented 4 years ago

Hi @gassmoeller, we have incorporated these suggestions and hopefully made all of the associated changes in the documentation, doc-strings and so forth. PR #15, covers the bulk of the changes associated with the review comments. Thanks again for making the suggestions you made, they really helped us improve this project, and for me these comments have helped me think about code structure and object hierarchies in a new light.

I think we are ready for this to be "unpaused".