[REVIEW]: FieldCompare: A Python package for regression testing simulation results

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@dglaeser@danielskatz<!--end-editor-- Reviewers: @idoby, @WilliamJamieson Archive: 10.5281/zenodo.7588449

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/aed2111437d48a44a0d6e13bc6da43b2"><img src="https://joss.theoj.org/papers/aed2111437d48a44a0d6e13bc6da43b2/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/aed2111437d48a44a0d6e13bc6da43b2/status.svg)](https://joss.theoj.org/papers/aed2111437d48a44a0d6e13bc6da43b2)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@idoby & @WilliamJamieson, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @danielskatz know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @idoby

📝 Checklist for @WilliamJamieson

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 1 year ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.14 s (450.6 files/s, 62472.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          46           1154            213           4151
SVG                              2              2              2           1504
XML                              9             48              9            882
Markdown                         2             65              0            224
TeX                              1             19              0            167
YAML                             1             10              1             85
TOML                             1              8              0             52
-------------------------------------------------------------------------------
SUM:                            62           1306            225           7065
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 1 year ago

Wordcount for paper.md is 1592

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/j.advwatres.2011.03.007 is OK
- 10.5281/zenodo.1183562 is OK
- 10.11588/ans.2017.1.27447 is OK
- 10.1007/s00607-008-0004-9 is OK
- 10.1515/jnma-2021-0081 is OK
- 10.1016/j.camwa.2011.04.012 is OK
- 10.25495/7GXK-RD71 is OK
- 10.5281/zenodo.5603255 is OK
- 10.1016/j.cpc.2021.108249 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

danielskatz commented 1 year ago

@idoby and @WilliamJamieson - Thanks for agreeing to review this submission. This is the review thread for the paper. All of our communications will happen here from now on.

As you can see above, you each should use the command @editorialbot generate my checklist to create your review checklist. @editorialbot commands need to be the first thing in a new comment.

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4905 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if either of you require some more time. We can also use editorialbot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@danielskatz) if you have any questions/concerns.

Reminder from pre-review issue: @WilliamJamieson won't be available to work on this until the week of 14 Nov

idoby commented 1 year ago

Review checklist for @idoby

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://gitlab.com/dglaeser/fieldcompare?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@dglaeser) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

WilliamJamieson commented 1 year ago

Review checklist for @WilliamJamieson

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://gitlab.com/dglaeser/fieldcompare?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@dglaeser) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

danielskatz commented 1 year ago

@WilliamJamieson - Can you comment on the items in your review that are not checked off? What is needed for you to check them to off? As stated above

reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4905 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread.

danielskatz commented 1 year ago

👋 @idoby - how is your review coming along? Is anything blocking you from continuing?

WilliamJamieson commented 1 year ago

@WilliamJamieson - Can you comment on the items in your review that are not checked off? What is needed for you to check them to off? As stated above

reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4905 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread.

Sorry I did make a couple of issues:

https://gitlab.com/dglaeser/fieldcompare/-/issues/33 (a suggestion that is not really holding things up)
https://gitlab.com/dglaeser/fieldcompare/-/issues/34 (a suggestion to improve the documentation in general)
https://gitlab.com/dglaeser/fieldcompare/-/issues/35 (a suggestion to improve the API documentation)
https://gitlab.com/dglaeser/fieldcompare/-/issues/36 (I would like this answered before I sign off on this)

danielskatz commented 1 year ago

Thanks - I'll add the tag in those.

danielskatz commented 1 year ago

I guess this isn't going to work well, since these are in gitlab, not github - oh well...

danielskatz commented 1 year ago

@dglaeser - Can you take a look at the issues @WilliamJamieson created?

danielskatz commented 1 year ago

👋 @idoby - how is your review coming along? Is anything blocking you from continuing?

idoby commented 1 year ago

Hi, sorry, life got in the way of this.

I'm posting my concerns here because I think they're too broad for GH/GL issues. In short, I have concerns about the design of this package. While I'm not sure this falls strictly under one or more checklist items, I feel the design severely impairs the usability, maintainability and future extensibility of this package.

My main concerns are:

Much of the code consists of either wrapping existing libraries (e.g. meshio, numpy) or reinventing wheels, e.g. logging. Python already has a perfectly good logging system that serves as the de-facto standard for logging in the language. This will not play nice with projects that customize their logging or use other logging frameworks (e.g. loguru); as a general rule, libraries should not use logging internally at all, or at the very least, accept a standard logger passed by the user.
Despite the fact that the authors declare many interfaces for extensibility, I suspect the package is difficult to extend or adapt. For example, since the authors wrap meshio rigidly as part of their API, they had to submit a PR to meshio to be able to load their own VirtualFluids format (I couldn't find that PR, by the way). I don't feel this is a reasonable design choice. Some projects invent their own formats for valid reasons, and being unable to load in their data unless they patch meshio isn't reasonable in my eyes. Furthermore, some projects (like my own) store data in multiple files and recombine the data upon loading using multiple meshio calls. Or, more generally,
The authors correctly identify that there is nothing coupling mesh I/O to the comparison logic itself, which is the reason this package exists, but their API couples the two very tightly, because sometimes the fields coming from a mesh file must be sorted and transformed before executing comparisons. On the other hand, they acknowledge that it can be useful to run comparisons on arbitrary tabular data with no mesh origin. Overall, this means one cannot mix fields from meshes with fields from tabular data, but this can be useful. For example, one might want to compare values associated with dofs on a mesh while also comparing a global series of values computed by evaluations of functionals, such as lift and drag for aerodynamics.
By wrapping Python's CSV loader, the user is unable to load a CSV with pandas, for example, and use existing pandas functionality to clean/transform the data.

Therefore, I think the authors should consider the following points:

Redesign the library into a pipeline-like API, where I/O is accepted either as a meshio data structure or a pandas dataframe, provide functions to extract/transform fields from a meshio object, then feed into the comparison function. In summary, I'd design two main pipelines:
- Accept a meshio mesh object from user -> sort, transform, etc -> extract into pandas dataframe -> return dataframe to user -> let the user feed DF into comparator or apply further transformations before doing that, etc
- Accept an arbitrary dataframe or multiple dataframes from user -> compare -> report Much of the above functionality doesn't need to be provided by this package, since it already exists, and the user can do anything they want between stages, such as calls to other libraries. In other words, keep the user in control of the code in their own program.
In addition to the above, provide the current all-in-one API as a facade over the flexible pipeline API for users who don't want to drive manually,
Avoid wrapping other libraries, such as numpy, meshio and others,
Avoid reinventing the wheel; utility functions for numpy etc, are OK, but they should be namespaced separately.

I realize the above represents a major rewrite, but I sincerely believe this will improve your package and raise the chances that people (like me) will use it.

danielskatz commented 1 year ago

👋 @WilliamJamieson and @dglaeser - please let us know your thoughts on these ☝️ comments

dglaeser commented 1 year ago

Hi @WilliamJamieson, @idoby and @danielskatz, thanks a lot for your thorough review and the helpful comments! I think we'll need some time to adress them, but I'll add a few initial thoughts/responses on the comments by @idoby here (I responded to @WilliamJamieson in some of the open issues already, thanks at this point also for your suggestions!).

First of all, thanks @idoby for the comments, they are highly appreciated! I think that they can help to converge to a solution that is best suited for users that want to interact with the library in different ways (since our own so far has been mainly via the CLI). However, I think it would be good if we can maybe converge on what the main contributions of this package may be. So far, I had the feeling that the main benefits it provides are for rather "high-level" use, but I'd be more than happy to improve usability for different use cases!

"as a general rule, libraries should not use logging internally at all, or at the very least, accept a standard logger passed by the user."

Good point, thanks! I am tempted to remove the logging facilities from the public API, the main purpose was for us to have a configurable way to control the output from the CLI onto the terminal or to the test reports. If we keep this exposed to the user, it would definitely be better to accept standard loggers. Thanks, I'll address this.

"Avoid wrapping other libraries, such as numpy, meshio and others"

Wrapping of numpy was mainly done for internal use such that in the rest of the code we don't do any direct calls to numpy, but keep track of which things we actually need/use while having the opportunity to later move to a different package if required (although very unlikely) with only changing a single file.. Only the factory functions are exposed to the public API to create an instance of an Array, however, users that work with numpy arrays don't have to use them. But I agree, with numpy being basically standard, we can remove the factories from the public API and require users to pass numpy arrays directly. It is unlikely that we will change to anything else than numpy.

I am not sure meshio is really "wrapped". We currently expose a simple reader function that takes a file and returns a container of fields. Meshio is used under the hood, but users should not notice or know about this? We simply use it under the hood because it provides reading functionality for many file formats of interest for us. But users never interact with meshio types, we always return our own container types. I think what's missing for your case is direct support for a meshio Mesh, currently we would require that you create a FieldContainer from that Mesh yourself.

"since the authors wrap meshio rigidly as part of their API, they had to submit a PR..." -> I am not sure I agree here. We could have simply written our own reader for the respective file format, since meshio is not exposed anywhere to the user, it's an implementation detail that we can change as we like without affecting the API as far as I see it. VirtualFluids does not use its own file format. It uses standard VTK, but the cell type (voxel) was not supported by meshio (https://github.com/nschloe/meshio/pull/1342). As said, we could have implemented our own reader for this case, but we thought it is better to try to incorporate it in meshio such that the rest of their users benefit from this change. Unfortunately, it seems that PRs are not really addressed there anymore, which is why VirtualFluids still uses their patched version. I agree that it would be nicer if that was not the case, and we are actually starting to consider handling VTK support ourselves (also because VTP is missing, another VTK format we use and for which another pending pull request, since quite some time, exists: https://github.com/nschloe/meshio/pull/1181).
Regarding your point 3: In our view, the mesh sorting part is one of the main contributions of this package, avoiding "false negative" results for meshes with changing order with otherwise the same field data. I agree that it is not nice to have to differentiate mesh fields and non-mesh fields, where sorting is not enabled. We thought about generalizing this, enabling sorting (e.g. by one of the fields) also for any other data, but there are quite some peculiarities for meshes: some data is associated with points, some with cells, sorting either of them affects only a subset of the fields; for non-conforming mesh output a simple sorting by coordinates is not unique and has to be treated in a special way; ... We couldn't yet come up with an abstraction that unifies all of this, but I'll give it some more thought. For our main use-case, the CLI, we simply exposed this via the respective flags. But I agree that programmatically this distinction is not so nice. On the other hand, as a user of the API you only have to "know" about this in case you want to deactivate sorting. If you deactivate sorting (or have a "sorted" mesh already), you should also be able to compare a mesh file and a tabular data file, I don't see why this shouldn't work.
Regarding your point 4: Again, I think this only applies to the CLI. Since reading is a separate concern than the comparison, if you use fieldcompare programmatically, you can read-in and preprocess the data as you like. But yes, you first have to create Fields, you cannot directly pass in pandas data frames, we currently don't have pandas as dependency at all. But, I am not sure that our package yields much benefit in this use case - the use case in which one is reading in and sorting with pandas - because I believe fuzzy comparison of dataframes can be done directly in pandas? I think so far we had considered only rather high-level programmatic use, providing users with a simple function to read in fields from files, iterating over them and expose field comparisons.

As said, I think your comments really help to improve the programmatic usage scenarios, and that it maybe would be good if we can converge on what the main contributions of this package can be (e.g. for your use case)? Or maybe asked differently, do you think there is a valuable contribution to your use case at all?

As far as I see it, we currently have

a. High-level I/O mechanisms that yield our own container type interface for fields. b. Comparison classes that accept two such containers and compare all fields in them while allowing for configuration of different aspects of the behaviour. c. Predicates for comparison of two Arrays (i.e. ndarrays).

Point (a) & (b) make up the CLI basically, which so far has been our own main use case. When skipping (b) and doing array comparisons (c) manually, one can also directly use numpy. So I think the main contributions so far are on the higher levels (and the CLI).

I noticed that our mesh-field-transformations are currently not exposed in the public API. They wrappers around MeshFields that expose a MeshFieldInterace themselves, while exposing the underlying fields in a transformed way. We currently already chain them (sorted points, sorted cells, etc...), and they do the transformations lazily upon request. So I guess if we make sure the transformations are user-friendly and expose them in the public API we would get closer to a potential pipeline design as suggested by you. I like the idea and will try to make it work. The way I see it, this would break up the point (a) I mentioned above in a more obvious way.

@danielskatz, is there a time limit for addressing the points? I feel that this will involve quite some rewriting. But improving the API as suggested by @idoby and adding a proper documentation as suggested by @WilliamJamieson will definitely improve this package a lot.

idoby commented 1 year ago

Thanks for replying to my comments. My thoughts:

Good point, thanks! I am tempted to remove the logging facilities from the public API, the main purpose was for us to have a configurable way to control the output from the CLI onto the terminal or to the test reports. If we keep this exposed to the user, it would definitely be better to accept standard loggers. Thanks, I'll address this.

In my opinion, the library should should either not use logging at all, or it should let the user pass in a standard logger. I'd prefer the library not to log anything unless a debug mode is enabled explicitly. I don't really see why a library like this should throw lines into my output. I'd prefer a nice function to nicely format the data structures returned by the lib, instead, that I can call whenever I want, send to my server, etc.

Wrapping of numpy was mainly done for internal use such that in the rest of the code we don't do any direct calls to numpy, but keep track of which things we actually need/use while having the opportunity to later move to a different package if required (although very unlikely) with only changing a single file.. Only the factory functions are exposed to the public API to create an instance of an Array, however, users that work with numpy arrays don't have to use them. But I agree, with numpy being basically standard, we can remove the factories from the public API and require users to pass numpy arrays directly. It is unlikely that we will change to anything else than numpy.

Wrapping numpy is not a big issue, but I'd prefer public helper functions to be under fieldcompare.utils.numpy or something, to make their purpose clear. I don't think there's a need to hedge against numpy getting replaced, but I don't mind if you do. It's not a big issue. My main point was about meshio.

I am not sure meshio is really "wrapped". We currently expose a simple reader function that takes a file and returns a container of fields. Meshio is used under the hood, but users should not notice or know about this? We simply use it under the hood because it provides reading functionality for many file formats of interest for us. But users never interact with meshio types, we always return our own container types. I think what's missing for your case is direct support for a meshio Mesh, currently we would require that you create a FieldContainer from that Mesh yourself.

I understand. My issue with this was that you were assuming my data pipeline begins by reading a single file my meshio, so might as well let fieldcompare drive meshio. But, this assumption makes many use cases I want to use, impossible. In any case, the difference between (pseudocode)

mesh = meshio.load(...)
fieldcompare.sort(mesh).compare()

and

fieldcompare.load(filepath).sort().compare()

is a minimal burden to your user, but in return, this enables fieldcompare to integrate into existing data pipelines. My use case is that I'm loading meshes from several files and then combining them together to form a final mesh with all of my fields present. Sometimes I load only some of the fields, etc, depending on the need. This is partially because I have a massive amount of data and many timesteps, and partially because saving the data this way during the simulation is faster.

"since the authors wrap meshio rigidly as part of their API, they had to submit a PR..." -> I am not sure I agree here. We could have simply written our own reader for the respective file format, since meshio is not exposed anywhere to the user, it's an implementation detail that we can change as we like without affecting the API as far as I see it. VirtualFluids does not use its own file format. It uses standard VTK, but the cell type (voxel) was not supported by meshio (vtkVoxel support nschloe/meshio#1342). As said, we could have implemented our own reader for this case, but we thought it is better to try to incorporate it in meshio such that the rest of their users benefit from this change. Unfortunately, it seems that PRs are not really addressed there anymore, which is why VirtualFluids still uses their patched version. I agree that it would be nicer if that was not the case, and we are actually starting to consider handling VTK support ourselves (also because VTP is missing, another VTK format we use and for which another pending pull request, since quite some time, exists: vtp support attempt nschloe/meshio#1181).

I see. In that case I recommend to add a few words about this in the paper, but it's not as big of a deal as I thought. Since you're importing meshio from Python and not your own baked-in version. So that only requires patching meshio in site-packages, which you'd have to do anyway to add a new cell type. So it's not that bad. I still think if you avoid wrapping meshio then that extra code can live outside of a meshio fork, in the user's own project, and also allow the use of other mesh loaders, like trimesh, but it's not as bad as I originally thought.

In summary, I think avoiding driving the I/O (and specifically meshio) makes fieldcompare much more useful as a library than it currently is, although I agree it doesn't matter for CLI usage. However, I don't think the CLI is the primary contribution here, even if considering CI/CD use cases, because again, your user is likely to have their own data pipeline, use additional libraries, and they might not map data to files 1:1.

Regarding your point 3: In our view, the mesh sorting part is one of the main contributions of this package, avoiding "false negative" results for meshes with changing order with otherwise the same field data. I agree that it is not nice to have to differentiate mesh fields and non-mesh fields, where sorting is not enabled. We thought about generalizing this, enabling sorting (e.g. by one of the fields) also for any other data, but there are quite some peculiarities for meshes: some data is associated with points, some with cells, sorting either of them affects only a subset of the fields; for non-conforming mesh output a simple sorting by coordinates is not unique and has to be treated in a special way; ... We couldn't yet come up with an abstraction that unifies all of this, but I'll give it some more thought. For our main use-case, the CLI, we simply exposed this via the respective flags. But I agree that programmatically this distinction is not so nice. On the other hand, as a user of the API you only have to "know" about this in case you want to deactivate sorting. If you deactivate sorting (or have a "sorted" mesh already), you should also be able to compare a mesh file and a tabular data file, I don't see why this shouldn't work.

I agree that mesh sorting is a great contribution, and one of the two main contributions in your package. I even think I might want to use it without the comparison logic, e.g., to sort the mesh for use in loss functions. So I think fieldcompare could benefit from more flexibility across the board. I I think the unifying abstraction is the pipeline-like design. Therefore, I'd like to see the ability to use FC as needed in three increasing levels of abstraction:

pipeline building blocks,
easy sort-and-compare facade and
CLI.

In the pipeline-like API, the user "deactivates" sorting in the pipeline API by not calling it :) . I also like the self-describing comparison operators, by the way. Pretty cool.

Regarding your point 4: Again, I think this only applies to the CLI. Since reading is a separate concern than the comparison, if you use fieldcompare programmatically, you can read-in and preprocess the data as you like. But yes, you first have to create Fields, you cannot directly pass in pandas data frames, we currently don't have pandas as dependency at all. But, I am not sure that our package yields much benefit in this use case - the use case in which one is reading in and sorting with pandas - because I believe fuzzy comparison of dataframes can be done directly in pandas? I think so far we had considered only rather high-level programmatic use, providing users with a simple function to read in fields from files, iterating over them and expose field comparisons.

Good point. This can probably be done in pandas alone and FC isn't needed in that case, though it could be nice if it was possible to extract a Fields from a pandas DF to use with FC, if my code already uses the FC logic for the meshes. It would probably be a good idea to hide a pandas dependency behind a fieldcompare[all] install flag in pip or something. You're right that FC shouldn't pull pandas just for that one function. Anyway, not a big deal and should not be a part of this review.

As said, I think your comments really help to improve the programmatic usage scenarios, and that it maybe would be good if we can converge on what the main contributions of this package can be (e.g. for your use case)? Or maybe asked differently, do you think there is a valuable contribution to your use case at all?

For me, the main contributions are 1) mesh field handling/sorting/transformation and 2) comparison and reporting logic. Both will probably be useful for my use case, but I don't think one interface should be favored over the others. Please consider making all three interfaces I described above equally usable.

I think this also addresses the rest of your comments? I just don't see the CLI as being more or less important than the programmatic interface, and believe the contribution here is greater than just a neat CLI tool.

Either way, I will not block the review on these grounds, if you consider the CLI to be the main use case and the library itself not worth investing in, I respect that decision.

@danielskatz Please consider adding something about design and extensibility to the checklist. The JOSS docs do mention these words, but it would be nice to have them explicitly in the checklist IMO, for anyone who wants to conduct reviews with these considerations in mind.

danielskatz commented 1 year ago

@idoby

@danielskatz Please consider adding something about design and extensibility to the checklist. The JOSS docs do mention these words, but it would be nice to have them explicitly in the checklist IMO, for anyone who wants to conduct reviews with these considerations in mind.

If you want to suggest changes for JOSS, doing this by an issue in https://github.com/openjournals/joss/issues is probably the best way. If you suggest specific language you would like, that will give us a starting point for discussion.

dglaeser commented 1 year ago

@idoby, thanks a lot again for your detailed message, it helped me to understand some of your points much better. I am happy you bring up usage scenarios that we had not in mind, and I really appreciate the valuable comments/ideas on how to make them possible.

Either way, I will not block the review on these grounds, if you consider the CLI to be the main use case and the library itself not worth investing in, I respect that decision.

As said, our focus was - above all - the CLI, but I would be happy to invest time and effort to improve the API by addressing the points you raised. @danielskatz, do we have a limit on when we have to have this finished?

danielskatz commented 1 year ago

there's not a strict limit - How long do you think this will take?

dglaeser commented 1 year ago

It is going to be very difficult to address this before Christmas, I think. I'll try, but as usual many things pile up towards the end of the year... January is usually a bit more relaxed, so I am pretty confident that I can propose a version with the points addressed until the end of January (maybe earlier). Would that be too late?

danielskatz commented 1 year ago

I think that's ok - let's try it and see

dglaeser commented 1 year ago

ok thanks a lot!

dglaeser commented 1 year ago

@danielskatz, @idoby, @WilliamJamieson, we have been working a lot on the package since the review, and from our side we are ready for a resubmission. While originally our focus was on providing a useful command-line application that can be used for regression-testing, e.g. in CI/CD pipelines, we have rewritten most of the package in order to also expose a more flexible API. Our own use-cases had been limited to the CLI, but during the revision we collaborated with Timo Koch to reuse the API of fieldcompare in the regression-test mechanism of Dumux to make its test-suite more flexible. On the basis of an actual use-case we were able to make many further improvements to the API. Since Timo also contributed with many commits to fieldcompare, I would like to include him in the list of authors for this paper. I have changed this in the paper already, I hope that this is not a problem.

I want to rephrase what I think was one of @idoby's most striking comments: originally, (via the API) one was able to sort meshes upon I/O (basically reading in a sorted mesh), while one did not have the possibility to sort a mesh obtained in a different way (e.g. from meshio). Technically, one did have that possibility already in the old version, however, it required to manually convert the meshio-mesh into the fieldcompares mesh representation, and then use the sorting algorithms (which were not exposed very prominently either). But in the new API, these things are separated more clearly, exposed more prominently in the API, and better documented. I will try to summarize what I think are the main changes:

mesh sorting options have been removed from I/O functions, but are now exposed separately and more prominently
one can convert between fieldcompare's mesh fields and a meshio-mesh importing the meshio_utils module from fieldcompare.mesh and then use meshio_utils.from_meshio() and meshio_utils.to_meshio()
The notion of "logging" has been removed from all public interfaces (originally the comparator class took a logger in its constructor, thus, was part of the public API). The main purpose was to provide control over the output that the comparator class emitted (which we made use of in the CLI), but this is now solved differently: the comparator class now takes a callback function that is invoked with the result of an individual comparison s.t. users can decide what to do with the result. In case they want to print informative output, we now also exposed functions to obtain our default reports as strings via the API.
Along the lines of the previous point, not only reporting but also much of the CLI logic has been moved to the API to be reused programmatically. For instance, a special comparator class for mesh fields was added that contains the sorting logic in case the meshes don't match (taking options that control the sorting behaviour again). The CLI now simply reuses that class. This saves a lot of boilerplate code for users that want to do regression tests on meshes programmatically.

Besides this, there are a number of new features, some of which are:

I/O from a variety of VTK files is now supported in-house (meshio is no longer required but an optional dependency). One reason for this was that meshio did not support one of the VTK cell types we were using (see discussion on the first submission), and another one was that several VTK flavours that we are using in our test pipelines are currently not supported by meshio (.vtp, .pvtu, .pvtp). fieldcompare now supports .vtu/.vtp files in all VTK-conforming ways, that is, different data formats, encoding and compression. Besides this, the corresponding parallel version of such VTK files can be read (i.e. pvtu/pvtp), yielding a single MeshFields object that combines all pieces. Sequences of fields are now treated separately in the API (FieldDataSequence protocol), and the Paraview Data Format (.pvd) is now supported to read-in time series.
For sequences, special options have also been added to the CLI
The CLI now also allows mapping a specific reader to certain files, which is more flexible than deducing it from the file extension (which is still done as a default)
Additionally, the CLI syntax for reader mapping allows passing options to the readers. For dsv/csv files one can select the delimiter, for instance, but the CSV reader now also scans the given file to deduce the delimiter and whether or not a header is present (if not passed as options), in order to have a more robust default behaviour.
black is used as linter/formatter, making it easier for developers to contribute

Finally, we have added improvements related to the selection of tolerances for fuzzy-comparisons. These are described in the paper (and also the README), where we have added a dedicated section on fuzzy-equality-checking, explaining our choices for default tolerances and what should be kept in mind when selecting custom ones.

Following @WilliamJamieson suggestions, we have published the package on pypi (we had intended this already but originally wanted to wait for the review) and now deploy the api documentation via GitLab pages.

Thanks once again for your fruitful comments, we are curious to hear your thoughts.

P.S: @WilliamJamieson I added further comments in the GitLab issues you opened

https://gitlab.com/dglaeser/fieldcompare/-/issues/34 https://gitlab.com/dglaeser/fieldcompare/-/issues/35 https://gitlab.com/dglaeser/fieldcompare/-/issues/36 https://gitlab.com/dglaeser/fieldcompare/-/issues/33 (this one is closed already)

Would you mind checking and closing them if you agree? (or reopen the closed one in case we were too optimistic there)

idoby commented 1 year ago

@dglaeser Wow, great work! I can't wait to dive in and see the changes.

From your post, I think I can conclude that FC has gained convenience and functionality and can be easily integrated into projects in multiple ways, which sounds great!

My concerns have been addressed and as far as I'm concerned, this project is ready to be published. Feel free to accept/publish. I will probably not find the time to read the new version of the code until the end of the month.

p.s Impressive work w.r.t mesh formats. Too bad meshio seems to be inactive...

idoby commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

danielskatz commented 1 year ago

@dglaeser - thanks for your work. (And let's call this a restart, not a resubmission.)

danielskatz commented 1 year ago

@idoby - I see you recompiled the paper after your comments above - can you please confirm that it looks good to you and that you think it and the software can be published?

idoby commented 1 year ago

@idoby - I see you recompiled the paper after your comments above - can you please confirm that it looks good to you and that you think it and the software can be published?

Yes, LGTM

danielskatz commented 1 year ago

@WilliamJamieson - I look forward to your comments on the updated submission, when you have a chance. (In particular, hopefully satisfying the three unckecked review criteria in your review, or providing feedback on what more is needed from the authors on these.)

WilliamJamieson commented 1 year ago

It looks like all of the issues I raised on GitLab have been addressed with regard to the package documentation itself. I like the use and description of using fieldcompare in CI/CD pipelines, particularly with the GitHub action.

I think it would be really nice (but unnecessary for the purposes of my review) to have a pytest plugin which preformed much of the same functions as the GitHub action, but which allowed one to run these regression tests as part of a regression suite running via pytest. This would allow users to integrate fieldcompare into existing regression test suites implemented in pytest (this is how most of the projects I work on implement regression tests).

I think this software is ready to be published.

danielskatz commented 1 year ago

@WilliamJamieson - can you then check off the final three items in your checklist?

dglaeser commented 1 year ago

@WilliamJamieson, thanks a lot for your suggestion, sounds like a very good idea! I opened a respective issue (https://gitlab.com/dglaeser/fieldcompare/-/issues/58).

WilliamJamieson commented 1 year ago

@WilliamJamieson - can you then check off the final three items in your checklist?

Correct

danielskatz commented 1 year ago

Sorry, I really meant: Will you please check off the final three items in your checklist? :)

WilliamJamieson commented 1 year ago

Sorry, I really meant: Will you please check off the final three items in your checklist? :)

Done, It scrolled so far up in this thread that I forgot I had one.

danielskatz commented 1 year ago

@dglaeser - at this point could you:

[ ] Make a tagged release of your software, and list the version tag of the archived version here.
[ ] Archive the reviewed software in Zenodo or a similar service (e.g., figshare, an institutional repository)
[ ] Check the archival deposit (e.g., in Zenodo) has the correct metadata. This includes the title (should match the paper title) and author list (make sure the list is correct and people who only made a small fix are not on it). You may also add the authors' ORCID.
[ ] Please list the DOI of the archived version here.

I will then generate a final proof, and proofread it, then we can move forward towards acceptance of the submission.

dglaeser commented 1 year ago

Hi @danielskatz, sorry for the delay. We wanted to make sure that the repo is REUSE-compliant and made the required changes for that. Also, we noticed that the GitLab-flavoured Markdown of the readme file was not properly rendered in the API docs. We have fixed this as well...

To respond to your comment:

Tag: 0.1.3 -> https://gitlab.com/dglaeser/fieldcompare/-/releases/0.1.3
Zenodo archive: https://zenodo.org/record/7588449, DOI: https://doi.org/10.5281/zenodo.7588449

idoby commented 1 year ago

@dglaeser What does REUSE-compliant mean?

dglaeser commented 1 year ago

@idoby, see https://reuse.software/. They also provide a linter that checks that the entire repository follows their guidelines on licensing source files. We have integrated this into our CI pipeline.

idoby commented 1 year ago

@dglaeser I see. Thanks.

Congrats on acceptance and archive!

danielskatz commented 1 year ago

Thanks @dglaeser - that's new to me as well, but seems reasonable. I'll create a proof now, then read it, and get back to you

danielskatz commented 1 year ago

@editorialbot set 0.1.3 as version

editorialbot commented 1 year ago

Done! version is now 0.1.3

danielskatz commented 1 year ago

@editorialbot set 10.5281/zenodo.7588449 as archive

editorialbot commented 1 year ago

Done! Archive is now 10.5281/zenodo.7588449

danielskatz commented 1 year ago

@editorialbot recommend-accept

openjournals / joss-reviews