veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
201 stars 68 forks source link

Contrast-FEL error message #1584

Closed KatDKeith closed 1 year ago

KatDKeith commented 1 year ago

Hi,

I've been getting the same error message from Contrast-FEL since yesterday (copied below). Anyone know what to do in this situation?

Thanks!

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

[n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 [n8:44823] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741 HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

HYPHYMPI terminated. Error: HyPhy killed by signal 15

[n8:44823] 15 more processes have sent help message help-mpi-api.txt / mpi-abort [n8:44823] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

spond commented 1 year ago

Dear @KatDKeith,

I'll let @stevenweaver comment as well, but this issue looks like an MPI problem (unreachable nodes), i.e. not something that HyPhy is doing. Can you run ANY MPI program successfully?

Best, Sergei

stevenweaver commented 1 year ago

Dear @KatDKeith and @spond,

I'm able to reproduce the issue. I'm looking into it now.

Best, Steven

stevenweaver commented 1 year ago

Dear @spond,

The issue stems from different behavior when encountering different types of runtime errors. Datamonkey expects all errors to be printed to stderr. When using mpirun, the actual issue is never printed to stderr. I can retrieve the error message from stdout or errors.log but this will require a small update on the backend.

Dear @KatDKeith,

The issue with your dataset appears to be a sequence alignment with stop codons in it. I see a series of attempts with

Error:The input alignment must have the number of sites that is divisible by 3 and must not contain stop codons in call to assert(fel.codon_filter.sites*3==fel.codon_data.sites, error_msg);
The input alignment must have the number of sites that is divisible by 3 and must not contain stop codons in call to assert(fel.codon_filter.sites*3==fel.codon_data.sites, error_msg)

Best, Steven

KatDKeith commented 1 year ago

Thank you @stevenweaver @spond