rycolab / aclpub2

MIT License
26 stars 38 forks source link

get_papers in softconf2aclpub.py only retrieves the initial 10 authors for a paper #173

Closed anaistack closed 1 month ago

anaistack commented 1 month ago

We've noticed that the get_papers function within softconf2aclpub.py only retrieves the initial 10 authors for a paper. This poses significant problems when compiling proceedings for papers with more than 10 authors. Is this limitation intentional or is it a bug that needs addressing?

You can find the specific line here: link.

mattshardlow commented 1 month ago

It appears to have been introduced in this commit:

https://github.com/rycolab/aclpub2/commit/3845a5675ecd6fa0d44b10d30712acfbf58eebba

previously it was 28, but this would still be an issue if a paper had >28 authors. It appears it's written this way as the API doesn't give a count of the authors, just numbered author info ("1: Last Name", "2: Last Name") as Dict Keys.

An easy fix would be to change l256 to:

           for i in range(1, len(row)):

This guarantees some large number greater than the maximum number of authors. I'm sure you could be more efficient by calculating how many rows are likely to be author names too (e.g., subtract other known values), but I don't know the API.

You could also calculate the number then plug that in:

num_authors = max([int(key.split(":")[0]) for key in row.keys() if "Last Name" in keys])

(not tested, but should work).

anaistack commented 1 month ago

Thanks for your quick suggestion @mattshardlow.

This is how I patched it locally:

authors = []

author_cre = re.compile(r'(\d+): Username')
authors_i = [
    author_cre.match(key).group(1)
    for key in row if author_cre.match(key)
]

for i in authors_i:  # range(1, 11):
anaistack commented 1 month ago

Perhaps this correction could be more effective, especially if the keys are not iterated in order:

author_cre = re.compile(r'(\d+): Username')
authors_i = sorted([
    int(author_cre.match(key).group(1))
    for key in row if author_cre.match(key)
])
crux82 commented 1 month ago

Dear all,

thank you for the support. If you checked and it works, can you please open a pull request? So we can merge in the main track?

TNX

crux82 commented 1 month ago

The problem is solved.

Thank you!