openreview / openreview-py

Official Python client library for the OpenReview API
https://openreview-py.readthedocs.io/en/latest/
MIT License
146 stars 22 forks source link

regex must be a prefix regex. If validation passes ... #1762

Closed MarvinT closed 1 year ago

MarvinT commented 1 year ago

Not sure exactly where to leave this issue but I'll start by adding it here since I ran into it first in the py package

If I try something like

invitations = client.get_invitations(regex=".*Blind_Submission")

It raises OpenReviewException: {'name': 'ValidationError', 'message': 'regex must be a prefix regex. If validation passes, any remaining regex characters will be escaped.', 'status': 400, 'details': {'path': '.regex', 'reqId': '2023-06-28-1540578'}}

Even though this used to work for example

I also noticed this breaks lots of openreview sites google search that shows examples https://openreview.net/group?id=NeurIPS.cc/2021/Workshop/ImageNet_PPF https://openreview.net/group?id=thecvf.com/ICCV/2019/Workshop/Pre-Reg https://openreview.net/group?id=roboticsfoundation.org/RSS/2022/Workshop/L-DOD https://openreview.net/group?id=EMNLP/2019/Workshop/Summarization

It seems like it would be very useful to have full regex validation for get_invitations Is there a reason it was changed? I'm assuming performance but maybe we can have optional full regex matching when necessary.

melisabok commented 1 year ago

Thanks for reporting this, the API has changed the implementation of the parameter regex for performance reasons. The new API renames this parameter to be prefix.

We thought we have updated all the existing pages. We will fix them soon.

MarvinT commented 1 year ago

The regex also doesn't seem to match things that I think it should... for example

import re
import openreview

regex = r"ICLR\.cc/2017/conference/-/submission"
invitation = "ICLR.cc/2017/conference/-/submission"

print(
    re.match(
        regex,
        invitation,
    )
)

client = openreview.Client(
    baseurl="https://api.openreview.net",
)

print(len(client.get_all_invitations(limit=100, regex=regex)))
print(len(list(openreview.tools.iterget_notes(client, invitation=invitation))))

returns

<re.Match object; span=(0, 36), match='ICLR.cc/2017/conference/-/submission'>
0
490
carlosmondra commented 1 year ago

Hello @MarvinT,

The output of the requests is correct. What were you expecting? So that i can understand the issue. The only one that may be confusing is the return value of get_all_invitations. That's because the invitation(s) that you are querying already expired. You need to pass the parameter expired=true.

MarvinT commented 1 year ago

Great, thanks!

I guess I was confused because

import re
import openreview

# regex = r"ICLR\.cc/2017/conference/-/submission"
# invitation = "ICLR.cc/2017/conference/-/submission"
regex = r"ICLR\.cc/2018/Conference/-/Blind_Submission"
invitation = "ICLR.cc/2018/Conference/-/Blind_Submission"

print(
    re.match(
        regex,
        invitation,
    )
)

client = openreview.Client(
    baseurl="https://api.openreview.net",
)

print(len(client.get_all_invitations(limit=100, regex=regex)))
print(len(list(openreview.tools.iterget_notes(client, invitation=invitation))))

matches an invitation:

<re.Match object; span=(0, 42), match='ICLR.cc/2018/Conference/-/Blind_Submission'>
1
929

but I guess that might be expected behavior and I just wasn't aware of the expiration after 5 years or something...

carlosmondra commented 1 year ago

Yes, invitations expiration date varies. It is usually set by the organizers. You can check the expiration of an invitation in the expdate field of the invitation.

Feel free to reopen this issue in case there is anything unresolved.