Closed rcannood closed 2 days ago
Honestly, given this performs worse than random, I'm inclined to just delete it.
I wouldn't! In the v2 and NeurIPS2021 repos we used use single-value predictor components as negative controls to check whether any of the metrics and other downstream components behave strangely when the predictions are all the same string, all 0's or all 1's.
Sure, that makes sense - but then it should be a different class of method. It's not a good measure of what zero should be, even if it's a useful test case. We could define a separate flag (is_test_case instead of is_baseline) but I don't think zero should be defined by this.
Agree that this should be kept. It's a baseline in the truest sense of the word... not an intentionally good or bad method. I like the addition of a new "simple default" class... before we have that... it should probably not be an is_baseline=True
method based on how we use this atm.
@LuckyMD are you arguing for this to be included in the results, or included in the scaling computation? I don't think we should scale to something that we know is worse than random.
On Mon, 28 Nov 2022, 7:59 am MalteDLuecken, @.***> wrote:
Agree that this should be kept. It's a baseline in the truest sense of the word... not an intentionally good or bad method. I like the addition of a new "simple default" approach.
— Reply to this email directly, view it on GitHub https://protect.checkpoint.com/v2/___https://github.com/openproblems-bio/openproblems/pull/710%23issuecomment-1329048484___.YzJlOmltbXVuYWk6YzpnOmQ3NzlkNjdhNzJjOTk0OTJlZmQwMmRiNjg4ZmMyNGJjOjY6Y2Y3ODo1MDM5ZDBmN2IzMzI2ZDc5NDQ2ZjU2OTM4NmQ3ZmI3OWMzN2QwMDIyNjFmZDEyNzdiMGIzMTA0NDYwZDVjYmE2Omg6VA, or unsubscribe https://protect.checkpoint.com/v2/___https://github.com/notifications/unsubscribe-auth/AUHCMAS5ECZCSFTAKWYUSJDWKSUA7ANCNFSM6AAAAAASM4H2WE___.YzJlOmltbXVuYWk6YzpnOmQ3NzlkNjdhNzJjOTk0OTJlZmQwMmRiNjg4ZmMyNGJjOjY6OGFjMDo3NWU4MTQ3YzcxNTU4NjkyY2NlNTU3ZGMwZWNjYzlkYzJhNGY0MTVkZTZjY2M0NzY4ZmU2YmRlODgxOGZkMWIyOmg6VA . You are receiving this because your review was requested.Message ID: @.***>
-- PLEASE NOTE: The information contained in this message is privileged and confidential, and is intended only for the use of the individual to whom it is addressed and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, or if any problems occur with the transmission, please contact the sender.
I don't think it's necessary to include this into the rescaling process, however I think this should be in results, marked as “baseline“ or smth else. Essentially, this is a description of datasets: percentage of the most popular label. Maybe we can remove this altogether after adding this info into dataset description.
Base: 95.06% // Head: 95.06% // No change to project coverage :thumbsup:
Coverage data is based on head (
9a91735
) compared to base (a796e02
). Patch has no changes to coverable lines.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@LuckyMD are you arguing for this to be included in the results, or included in the scaling computation? I don't think we should scale to something that we know is worse than random.
Sorry, let me clarify: I agree with you that this is not a baseline as we have it defined now. It should be included in the results and potentially flagged as a "simple baseline" or sth like that. That being said, I'd be surprised if this is worse than a random baseline for all scenarios.
For accuracy, it is not always worse, but sometimes. For F1, it's always worse (by definition, I think). I think the simplest solution here is to change its name to Majority Vote (naive)
. The more complex version involves implementing an additional flag which denotes baseline but not for scaling, e.g. is_baseline in [True, False, "display"]
This pull request has been automatically closed because it has not had recent activity.
I think
is_baseline
is meant to be set to True, given that nobody will actually use this type of approach in real life and call it a day. And also given that it's in thebaseline.py
file.Submission type