Handle non-digit characters in AuthorRetrieval().classificationgroup

raffaem commented 2 years ago

fix #258

Michael-E-Rose commented 2 years ago

The "".join(filter(str.isdigit, item["$"]) bit is good, but to wrap all of this in a try-except makes the code just too complex.

Why not just always use this expression? filter() is lightning fast and it can capture whatever non-digits there might be.

@property
def classificationgroup(self) -> Optional[List[Tuple[int, int]]]:
    """List with (subject group ID, number of documents)-tuples."""
    path = ['classificationgroup', 'classifications', 'classification']
    out = [(int("".join(filter(str.isdigit, item["$"])))
            int(item['@frequency']))
           for item in listify(chained_get(self._profile, path, []))]
    return out or None

raffaem commented 2 years ago

How does it look now?

Michael-E-Rose commented 2 years ago

Ah, you created an anonymous function for this. Also nice.

But why not making it a proper function? I have the feeling it might be used again later. That is, you just put the function in https://github.com/pybliometrics-dev/pybliometrics/blob/master/pybliometrics/scopus/utils/parse_content.py and import in this place. To comply with the names of the other function, what about filter_digits()?

Michael-E-Rose commented 2 years ago

Thanks for this!

pybliometrics-dev / pybliometrics

Handle non-digit characters in AuthorRetrieval().classificationgroup #259