Closed KatDKeith closed 1 year ago
Dear @KatDKeith,
I'll let @stevenweaver comment as well, but this issue looks like an MPI problem (unreachable nodes), i.e. not something that HyPhy is doing. Can you run ANY MPI program successfully?
Best, Sergei
Dear @KatDKeith and @spond,
I'm able to reproduce the issue. I'm looking into it now.
Best, Steven
Dear @spond,
The issue stems from different behavior when encountering different types of runtime errors. Datamonkey expects all errors to be printed to stderr
. When using mpirun
, the actual issue is never printed to stderr
. I can retrieve the error message from stdout
or errors.log
but this will require a small update on the backend.
Dear @KatDKeith,
The issue with your dataset appears to be a sequence alignment with stop codons in it. I see a series of attempts with
Error:The input alignment must have the number of sites that is divisible by 3 and must not contain stop codons in call to assert(fel.codon_filter.sites*3==fel.codon_data.sites, error_msg);
The input alignment must have the number of sites that is divisible by 3 and must not contain stop codons in call to assert(fel.codon_filter.sites*3==fel.codon_data.sites, error_msg)
Best, Steven
Thank you @stevenweaver @spond
Hi,
I've been getting the same error message from Contrast-FEL since yesterday (copied below). Anyone know what to do in this situation?
Thanks!
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.
[n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
HYPHYMPI terminated. Error: HyPhy killed by signal 15
[n8:44823] 15 more processes have sent help message help-mpi-api.txt / mpi-abort [n8:44823] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages