Closed krivit closed 10 months ago
How would a user remove ambiguity? If it's straightforward, then an informative message with example or link to syntax would be ideal.
On Thu, Dec 21, 2023 at 4:19 PM Pavel N. Krivitsky @.***> wrote:
This is stemming from an issue reported by #544 https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/544__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbt1Kipyc$ reported by @benrosche https://urldefense.com/v3/__https://github.com/benrosche__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbV42TF5Y$ .
Suppose that in the attribute of interest, there is an exact tie in the frequency of the values. Sometimes, it won't have any effect, and sometimes, it will. For example, if the frequencies are, in descending order, 100, 90, 90, 80, etc., then LARGEST and LARGEST(1) won't be affected but LARGEST(2) will be.
Currently, the tie-breaking is arbitrary. It would probably be better to detect ties when they affect the result. What should the code do?
- Warn and break ties arbitrarily.
- Stop with an error, insisting the user remove ambiguity.
I am leaning towards the second. Any thoughts? @martinamorris https://urldefense.com/v3/__https://github.com/martinamorris__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbNMAwWJY$ , @mbojan https://urldefense.com/v3/__https://github.com/mbojan__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWb9aNY8Cg$ , @drh20drh20 https://urldefense.com/v3/__https://github.com/drh20drh20__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbszFYpFk$ , @handcock https://urldefense.com/v3/__https://github.com/handcock__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbbwOVFV4$ , @CarterButts https://urldefense.com/v3/__https://github.com/CarterButts__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWby2AyshU$ , @sgoodreau https://urldefense.com/v3/__https://github.com/sgoodreau__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbpS2rjkc$ ?
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/545__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbaPVAZJ0$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB6QTYXWYPQJASXDR6KQQ5TYKTGYZAVCNFSM6AAAAABA7D2MPGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2TGMRQGIYTCNA__;!!K-Hz7m0Vt54!nPvZhIqvX5g7p3s1cEhn_VaSwmSM-Fsd2-k_M2d9uvdIkQHzdPJygeOSJ_V46MnR6OEUxmSbiLAqkAWbuf6TIvw$ . You are receiving this because you were mentioned.Message ID: @.***>
How would a user remove ambiguity?
Either specify the categories manually or increase or decrease the argument to either include all of the tied categories or none.
If it's straightforward, then an informative message with example or link to syntax would be ideal.
Should the message be in the form of a warning or an error, though?
Or perhaps analogous to which.max() and sort(), that is whichever value comes first in the data?
I like @Michał Bojanowski @.***> 's idea. If there's a standard R default to ordering in the presence of ties, use that, with a warning (not an error) and a pointer to the syntax for manual collapsing.
On Fri, Dec 22, 2023 at 7:59 AM Michał Bojanowski @.***> wrote:
Or perhaps analogous to which.max() and sort(), that is whichever value comes first in the data?
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/545*issuecomment-1867844450__;Iw!!K-Hz7m0Vt54!h-T_eZy4csdN7FOxuSvkhc1lkAjrHW1nAGNPmT7FWW3RZM3cwG2KkUY92xIXcDNZlxFi8HVlrXmHZz-rAbHnmLw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB6QTYULRN3EBTKNY3E6Q6LYKWU6VAVCNFSM6AAAAABA7D2MPGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXHA2DINBVGA__;!!K-Hz7m0Vt54!h-T_eZy4csdN7FOxuSvkhc1lkAjrHW1nAGNPmT7FWW3RZM3cwG2KkUY92xIXcDNZlxFi8HVlrXmHZz-rickBOAQ$ . You are receiving this because you were mentioned.Message ID: @.***>
For this type of tiebreaker, there's lexicographic order (i.e., "a" comes before "b"), and there is order in the nodal attribute list. There is an argument for either. (We default to lexicographic for factor levels, for example.)
I'm with @mbojan and @martinamorris . In particular, I do not like the idea of an error, as this does not technically seem like an error. Indeed, a warning message is a kindness here.
OK, I'll see about implementing a lexicographic tie breaker.
This is stemming from an issue reported by #544 reported by @benrosche.
Suppose that in the attribute of interest, there is an exact tie in the frequency of the values. Sometimes, it won't have any effect, and sometimes, it will. For example, if the frequencies are, in descending order, 100, 90, 90, 80, etc., then
LARGEST
andLARGEST(1)
won't be affected butLARGEST(2)
will be.Currently, the tie-breaking is arbitrary. It would probably be better to detect ties when they affect the result. What should the code do?
I am leaning towards the second. Any thoughts? @martinamorris , @mbojan , @drh20drh20 , @handcock , @CarterButts , @sgoodreau ?