Closed whedon closed 3 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @robertsawko, @rhaas80 it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1017/jfm.2020.485 is OK
MISSING DOIs
- 10.1103/physrevfluids.4.121501 may be a valid DOI for title: Convection driven by internal heat sources and sinks: Heat transport beyond the mixing-length or “ultimate” scaling regime
INVALID DOIs
- None
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
:wave: @robertsawko, please update us on how your review is going.
:wave: @rhaas80, please update us on how your review is going.
@robertsawko, @rhaas80, could you post a review update?
comments working through installation:
git clone https://github.com/BenMql/coral.git .
I would either leave out the dot .
or replace by $PWD
which is easier to seeexport CORAL_ROOT=$(PWD)
is incorrect and must be $PWD
or ${PWD}
or $(pwd)
since there is no PWD
executable only, a PWD
variable and a pwd
executable.Decomp2d_installation
file does not mention$(MKL_PATH)/lib/ia32
though by now intel64
ought to be the default choiceWord_of_caution_fftw
it is stated "In addition, Coral is linked against a patched version of 2decomp. The patch requires that 2decomp uses the fftw3_f03 interface." but nowhere is this shown in the "regular" installation instructionsclean
target assumes that CORAL_ROOT
ends in a slash, which is not the case for export CORAL_ROOT=$PWD
FAILURE: gfortran-10 fails with:
/src/sparse_tools/sparse_manipulations.f90:124:19:
124 | ptr2B, A%ncol, dbeta, ptr2C, A%nrow)
| 1
......
189 | ptr2B, A%ncol, dbeta, ptr2C, A%nrow)
| 2
Error: Type mismatch between actual argument at (1) and actual argument at (2) (COMPLEX(8)/REAL(8)).
-fallow-argument-mismatch
in MPIFLAGS
(proper code being preferred)make -j
fails since dependencies for module files are not recorded in makefile
Thank you for your comments.
.
, $PWD
and missing slashes issues have been fixed. 2decomp
doc now states explicitly that a vanilla fftw is required.MKL
as some of their (non standard) routines are used.sparse_manipulation.f90
. This is interesting, I never detected them despite an heavy use of pedantic
flags and standard-compliance checks (not with gfortran-10 though). (As an aside: Do I understand correctly that the compiler was smart enough to compare lines 124
and 189
and realized that the same routine was suspiciously called twice with different argument types? The error message suggests that the error was not detected by comparing the call to an interface.)The argument mismatch is new by gfortran. Apparently such usage was never really allowed by the Fortran standard, but also never really enforced by compilers (though you may want to give eg the Cray compiler a try which in my experience is pickier than gfortran or ifort). You are certainly not the only code tripped up by this, eg https://www.google.com/search?q=gfortran-10+allow-mismatch&ie=utf-8&oe=utf-8 . For gfortran's release announcement about the issue, please see: https://gcc.gnu.org/gcc-10/changes.html#fortran
Thank you for these details. This feature definitely helped catching a bug which fortunately had no consequences. (Some pointers used to be twice as long as intended, but fortunately their second half was never accessed.) Anyway, this glitch is fixed now. Let me know if you run into more trouble while compiling.
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
In "2_First_run":
opt_decomp
passed to transpose_y_to_z
is incorect. Here's the information:
Thread 1 "coral_LP.exe" received signal SIGSEGV, Segmentation fault.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:262
262 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) up
#1 0x00007fffecd982fc in ?? () from /usr/lib/x86_64-linux-gnu/libopen-pal.so.40
(gdb) up
#2 0x00007fffecf268aa in ompi_datatype_sndrcv () from /usr/lib/x86_64-linux-gnu/libmpi.so.40
(gdb)
#3 0x00007fffecf732e3 in ompi_coll_base_alltoallv_intra_pairwise ()
from /usr/lib/x86_64-linux-gnu/libmpi.so.40
(gdb)
#4 0x00007fffe65b0fa2 in ompi_coll_tuned_alltoallv_intra_dec_fixed ()
from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so
(gdb)
#5 0x00007fffecf2936b in PMPI_Alltoallv () from /usr/lib/x86_64-linux-gnu/libmpi.so.40
(gdb)
#6 0x00007fffed034d0b in pmpi_alltoallv__ () from /usr/lib/x86_64-linux-gnu/libmpi_mpifh.so.40
(gdb)
#7 0x00005555555de192 in decomp_2d::transpose_y_to_z_real (
src=<error reading variable: value requires 2654208 bytes, which is more than max-value-size>,
dst=<error reading variable: value requires 2654208 bytes, which is more than max-value-size>,
opt_decomp=...) at transpose_y_to_z.f90:87
87 real_type, DECOMP_2D_COMM_ROW, ierror)
(gdb) print opt_decomp
$1 = ( xst = (1, 1, 49), xen = (72, 96, 96), xsz = (72, 96, 48), yst = (1, 1, 49), yen = (72, 96, 96), ysz = (72, 96, 48), zst = (1, 49, 1), zen = (72, 96, 96), zsz = (72, 48, 96), x1dist = (72), y1dist = (96), y2dist = (48, 48), z2dist = (48, 48), x1cnts = (331776), y1cnts = (331776), y2cnts = (165888, 165888), z2cnts = (165888, 165888), x1disp = (0), y1disp = (0), y2disp = (0, 165888), z2disp = (0, -322823200), x1count = 331776, y1count = 331776, y2count = 165888, z2count = 165888, even = .TRUE. )
where opt_decomp%z2disp(2)
being -322823200
seems like the likely culprit to me.
This seems to be due to assuming that allocate
would zero initialize the allocated memory. I can make the SEGFAULT go away by adding
decomp%x1cnts = 0
decomp%y1cnts = 0
decomp%y2cnts = 0
decomp%z2cnts = 0
decomp%x1disp = 0
decomp%y1disp = 0
decomp%y2disp = 0
decomp%z2disp = 0
just before the prepare_buffer
call. It may be good to run the whole code through valgrinds' memcheck tool.
Roland,
I have fixed the wiki 2_first_run
.
I am most grateful for your checks, comments and suggestions.
In 2decomp&fft
, the arrays of integers are now initialized to 0 as you suggested (available as the most recent commit to https://github.com/benmql/2decomp_emptyFourierFourier ). This test case has been run through valgrind, with 2 processes. One error was reported after execution, which looks related to library code (and not Coral itself):
==340744== 1 errors in context 1 of 1:
==340744== Syscall param write(buf) points to uninitialised byte(s)
==340744== at 0x53614BD: ??? (syscall-template.S:84)
==340744== by 0xF951FC8: ???
==340744== by 0x4E7DD6B: ??? (in /usr/lib/openmpi/lib/libopen-rte.so.12.0.2)
==340744== by 0x50F603E: mca_base_framework_close (in /usr/lib/openmpi/lib/libopen-pal.so.13.0.2)
==340744== by 0x7BAB47C: ???
==340744== by 0x4E4E439: orte_finalize (in /usr/lib/openmpi/lib/libopen-rte.so.12.0.2)
==340744== by 0x404F97: ??? (in /usr/bin/orterun)
==340744== by 0x403615: ??? (in /usr/bin/orterun)
==340744== by 0x558E83F: (below main) (libc-start.c:291)
==340744== Address 0xffefff4f4 is on thread 1's stack
==340744== Uninitialised value was created by a stack allocation
==340744== at 0xF951F50: ???
No segfault was triggered.
Thanks @rhaas80 for your thorough comments, and @BenMql for addressing them promptly!
@robertsawko, do you have any update to report at this stage?
Thanks @rhaas80 for your thorough comments, and @BenMql for addressing them promptly! @eloisabentivegna Sorry about the long delay. I am unfortunately not yet done with the review.
@eloisabentivegna Sorry about the long delay. I am unfortunately not yet done with the review.
Thanks for your progress so far, @rhaas80. I can see your checklist is not complete, but your help up to this point is very appreciated.
Apologies for a radio silence. I am going to get back to reviewing this week. I have been reporting on GH issues of the software repository as I think that's the recommendation with JOSS. Made a few comments about the documentation. Unfortunately I run into an issue with the build on Arch Linux, but I think I should be able to resolve it now.
Hello @eloisabentivegna,
I have closed off my remaining issues with the documentation and installations - it all works well now. I went through the checklist and modified accordingly. Later this week I will check the visualisation to confirm the functionality and performance, but all looks good so far.
Thank you all for your time and your valuable suggestions, which have spurred improvements in both the code, the doc, and the paper. If you have more remarks, I am looking forward to hearing them!
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Thanks, @robertsawko! You did the right thing: specific change requests go in the package repository (you can of course create mentions to this issue in that venue, if it makes sense).
@rhaas80, @robertsawko, could you update us on the status of your reviews, and state whether you're ready to make a final recommendation?
Thank you both for your suggestions and corrections. I did not have time to implement them yet, and the sanitary measures in France are such that in all likelihood I will be able to resume my work on this project by the end of the month only. Apologies for the delay and thank you for your patience.
@BenMql, thanks for the update. I will wait for your changes before pinging the reviewers again.
@BenMql, were you able to address the reviewers' requests yet? Could you send an update?
@eloisabentivegna , apologies for the long delay, thank you for your patience. We had a string of Covid19-related issues in the household (nationwide school closures, sickness, local school closure) that have proved rather incapacitating. Those all came to a resolution now. I have started to address the issues raised by the reviewers and will send an update when finished---this should be relatively quick, say a couple days.
Thanks for the update, @BenMql. I am sorry to hear about all the disruption.
Keep us posted!
@BenMql, could you post an update?
Dear all,
The issue that have been thoroughly reported by @rhaas80 and @robertsawko are now fixed.
IMHO, only issue #13 still deserves some additional work: I tried my best to have a pedagogical and gradual approach for teaching the user how to enter PDEs in the solver Coral. I am still working to streamline the process. On the bright side, the user is now provided with 3 full benchmarks taken from some (recent) published references, which provide some starting points for implementing their own equations (with the help of the doc).
On a different note, while testing on different machines since my last update, I triggered a bug of the DFT library with some recent compilers. After a while, it became clear that the culprits were optimization flags, which should be rolled back a little with recent gcc compilers when compiling 2decomp&fft
. This is now properly documented in the wiki. Perhaps more interestingly, this has prompted me to implement automated tests in Coral, that allows the user to test their compilation of the DFT library. @eloisabentivegna Apologies for the additional delay, this buggy behaviour was completely unexpected and was completely under the radar until very recently.
Thanks for the update, @BenMql! I am glad this process has prompted you to further develop Coral (for one reason or another!).
@rhaas80, @robertsawko, do you have any comments?
@rhaas80, @robertsawko, could you respond to @BenMql's latest changes? Do they address your requests?
Hi all - just to let you know, because @eloisabentivegna is taking leave, I'll be stepping in as editor on this submission - please let me know when there are issues.
👋 @rhaas80, @robertsawko - can you update me on your status? Are your reviews currently blocked on anything from @BenMql?
@danielskatz @BenMql sorry for being so late, not blocked by @BenMql just swamped with short term tasks right now.
@danielskatz, @BenMql and @eloisabentivegna. Sorry, I have also been swamped with short term tasks and I will be away next week for 1w. I will make sure to schedule some time to finish off this review on the week of the 19th of July. You will hear from me then. Really sorry this taking so much time.
@whedon remind @robertsawko in 1 week
Reminder set for @robertsawko in 1 week
@whedon remind @danielskatz in 1 week
Reminder set for @danielskatz in 1 week
Hello,
Apologies again for delay. I have now finished reviewing the code. Few minor points to consider below. Overall I think the code is an excellent contribution.
I believe several outstanding issues can be also closed now. For instance this is working for me on the latest commit.
:wave: @robertsawko, please update us on how your review is going (this is an automated reminder).
@BenMql - it looks like you now have some things to look at, as @robertsawko has just three items left to check off in the review, which I think match the comments above
@rhaas80 - when do you think you will be able to make progress on the remaining items in your review?
@danielskatz @BenMql I am hosting a summer school this week, so cannot guarantee much progress before the end of the week, sorry.
👋 Hi all - Can I get an update on the status of this review?
@rhaas80 - it looks like you opened some issues 2 weeks ago. Once these are resolved, will your review be complete? Or are there other things you are also doing?
@robertsawko - What is needed for you to complete your review? Are the items in https://github.com/openjournals/joss-reviews/issues/2978#issuecomment-886235689 from about a month ago still outstanding?
@BenMql - What is your progress on the issues raised by @rhaas80 and those in the comment from @robertsawko?
Thanks all!!
Submitting author: @benmql (BENJAMIN MIQUEL) Repository: https://github.com/BenMql/coral Version: v1.1.12 Editor: @eloisabentivegna Reviewer: @robertsawko, @rhaas80 Archive: 10.5281/zenodo.5458888
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@robertsawko & @rhaas80, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @eloisabentivegna know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Review checklist for @robertsawko
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @rhaas80
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper