noworneverev / eurlex-parser

An EUR-Lex parser
MIT License
7 stars 1 forks source link

Support for Parsing Commission Delegated Regulations (CDRs) #6

Closed ldelavaissiere closed 2 months ago

ldelavaissiere commented 2 months ago

Hello, I've been using the package and it's a great project! It works perfectly with regulations of the Parliament, but I've encountered an issue when trying to fetch Commission Delegated Regulations (CDRs).

E.g., trying to fetch document 32024R0856:

from eurlex import get_data_by_celex_id

data = get_data_by_celex_id('32024R0856')
print(data)

...I'm getting the following error message:

Traceback (most recent call last):
  File "c:\Users\foo\eurlex-parser\eurlex-parser.py", line 3, in <module>
    data = get_data_by_celex_id('32024R0856')
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\foo\eurlex-parser\.venv\Lib\site-packages\eurlex.py", line 359, in get_data_by_celex_id
    modified_by_documents = extract_related_documents(celex_id, language, 'relatedDocsTb')
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\foo\eurlex-parser\.venv\Lib\site-packages\eurlex.py", line 389, in extract_related_documents
    headers = [header.get_text(strip=True) for header in table.find('thead').find_all('th')]
                                                         ^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'find'

Could you please confirm if the module supports parsing CDRs? If not, I would appreciate any guidance on how to handle these documents or if there are plans to add support for them in the future.

Thank you for your help!

noworneverev commented 2 months ago

Hi, thanks for reporting the bug, after upgrading to the latest version, you'll be able to parse that regulation.

pip install eurlex-parser --upgrade
ldelavaissiere commented 2 months ago

Thank you for the quick resolution! I appreciate your prompt attention to the issue.