Open vlevy-pci opened 7 months ago
这是来自QQ邮箱的假期自动回复邮件。你好,我最近正在休假中,无法亲自回复你的邮件。我将在假期结束后,尽快给你回复。
has this issue been taken? if not, I would like to work on it.
Hi Frank,
I wrote a fix for my project but I have not submitted a PR for it. Please feel free to take it over. Hopefully it will be straightforward to work it from my description, but if you want my version as a reference, you are welcome to it.
Best regards, Vic
From: Frank Tianyu Zeng @.> Sent: Wednesday, June 5, 2024 11:46 PM To: jtablesaw/tablesaw @.> Cc: Vic Levy @.>; Author @.> Subject: Re: [jtablesaw/tablesaw] Duplicate Rows May Remain After dropDuplicateRows Due to Early Return in isDuplicate (Issue #1248)
has this issue been taken? if not, I would like to work on it.
— Reply to this email directly, view it on GitHub https://github.com/jtablesaw/tablesaw/issues/1248#issuecomment-2151356460 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AK2UY3H2CUDAK2HBA6M6WB3ZF7LO7AVCNFSM6AAAAABDHK5MYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJRGM2TMNBWGA . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AK2UY3DTH5P4LEV6VCGPX5DZF7LO7A5CNFSM6AAAAABDHK5MYWWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUAHMMCY.gif Message ID: @. @.> >
Description: When using
dropDuplicateRows
to eliminate duplicate entries from a table, I observed that duplicates were still present in the output. Upon investigation, the root cause was identified in theisDuplicate
function. This function is designed to iterate over rows that share a hash with the row being evaluated to determine if it is a duplicate. However, it incorrectly returnsfalse
(indicating the row is unique) during the first iteration if the first checked row does not match, without examining the remaining rows.Expected Behavior: The
isDuplicate
function should only returnfalse
after all rows with the matching hash have been checked and none are found to be identical to the row being evaluated. This ensures that a row is only considered unique if it has been verified against all potential duplicates.Actual Behavior: The function returns
false
prematurely after comparing with the first row that shares a hash, potentially leaving unexamined duplicates in the table.Resolution: The issue was resolved by modifying
isDuplicate
to complete its iteration over all rows with a matching hash before deciding that the row is not a duplicate. This change ensured thatdropDuplicateRows
correctly removed all duplicates from the table.