How should SMALLEST(), LARGEST(), and COLLAPSE_SMALLEST() handle ties?

krivit commented 11 months ago

This is stemming from an issue reported by #544 reported by @benrosche.

Suppose that in the attribute of interest, there is an exact tie in the frequency of the values. Sometimes, it won't have any effect, and sometimes, it will. For example, if the frequencies are, in descending order, 100, 90, 90, 80, etc., then LARGEST and LARGEST(1) won't be affected but LARGEST(2) will be.

Currently, the tie-breaking is arbitrary. It would probably be better to detect ties when they affect the result. What should the code do?

Warn and break ties arbitrarily.
Stop with an error, insisting the user remove ambiguity.

I am leaning towards the second. Any thoughts? @martinamorris , @mbojan , @drh20drh20 , @handcock , @CarterButts , @sgoodreau ?

martinamorris commented 11 months ago

How would a user remove ambiguity? If it's straightforward, then an informative message with example or link to syntax would be ideal.

On Thu, Dec 21, 2023 at 4:19 PM Pavel N. Krivitsky @.***> wrote:

This is stemming from an issue reported by #544 https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/544__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbt1Kipyc$ reported by @benrosche https://urldefense.com/v3/__https://github.com/benrosche__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbV42TF5Y$ .

Suppose that in the attribute of interest, there is an exact tie in the frequency of the values. Sometimes, it won't have any effect, and sometimes, it will. For example, if the frequencies are, in descending order, 100, 90, 90, 80, etc., then LARGEST and LARGEST(1) won't be affected but LARGEST(2) will be.

Currently, the tie-breaking is arbitrary. It would probably be better to detect ties when they affect the result. What should the code do?

Warn and break ties arbitrarily.

Stop with an error, insisting the user remove ambiguity.

I am leaning towards the second. Any thoughts? @martinamorris https://urldefense.com/v3/__https://github.com/martinamorris__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbNMAwWJY$ , @mbojan https://urldefense.com/v3/__https://github.com/mbojan__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWb9aNY8Cg$ , @drh20drh20 https://urldefense.com/v3/__https://github.com/drh20drh20__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbszFYpFk$ , @handcock https://urldefense.com/v3/__https://github.com/handcock__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbbwOVFV4$ , @CarterButts https://urldefense.com/v3/__https://github.com/CarterButts__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWby2AyshU$ , @sgoodreau https://urldefense.com/v3/__https://github.com/sgoodreau__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbpS2rjkc$ ?

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/545__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbaPVAZJ0$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB6QTYXWYPQJASXDR6KQQ5TYKTGYZAVCNFSM6AAAAABA7D2MPGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2TGMRQGIYTCNA__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbuf6TIvw$ . You are receiving this because you were mentioned.Message ID: @.***>

krivit commented 11 months ago

How would a user remove ambiguity?

Either specify the categories manually or increase or decrease the argument to either include all of the tied categories or none.

If it's straightforward, then an informative message with example or link to syntax would be ideal.

Should the message be in the form of a warning or an error, though?

mbojan commented 11 months ago

Or perhaps analogous to which.max() and sort(), that is whichever value comes first in the data?

martinamorris commented 11 months ago

I like @Michał Bojanowski @.***> 's idea. If there's a standard R default to ordering in the presence of ties, use that, with a warning (not an error) and a pointer to the syntax for manual collapsing.

On Fri, Dec 22, 2023 at 7:59 AM Michał Bojanowski @.***> wrote:

Or perhaps analogous to which.max() and sort(), that is whichever value comes first in the data?

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/545*issuecomment-1867844450__;Iw!!K-Hz7m0Vt54!h-T_eZy4csdN7FOxuSvkhc1lkAjrHW1nAGNPmT7FWW3RZM3cwG2KkUY92xIXcDNZlxFi8HVlrXmHZz-rAbHnmLw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB6QTYULRN3EBTKNY3E6Q6LYKWU6VAVCNFSM6AAAAABA7D2MPGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXHA2DINBVGA__;!!K-Hz7m0Vt54!h-T_eZy4csdN7FOxuSvkhc1lkAjrHW1nAGNPmT7FWW3RZM3cwG2KkUY92xIXcDNZlxFi8HVlrXmHZz-rickBOAQ$ . You are receiving this because you were mentioned.Message ID: @.***>

krivit commented 11 months ago

For this type of tiebreaker, there's lexicographic order (i.e., "a" comes before "b"), and there is order in the nodal attribute list. There is an argument for either. (We default to lexicographic for factor levels, for example.)

drh20drh20 commented 11 months ago

I'm with @mbojan and @martinamorris . In particular, I do not like the idea of an error, as this does not technically seem like an error. Indeed, a warning message is a kindness here.

krivit commented 10 months ago

OK, I'll see about implementing a lexicographic tie breaker.

statnet / ergm

How should SMALLEST(), LARGEST(), and COLLAPSE_SMALLEST() handle ties? #545