noworneverev / eurlex-parser

An EUR-Lex parser
MIT License
4 stars 0 forks source link
eur-lex eurlex european-union legal-acts legal-documents official-journals

Eurlex Parser

This Python package fetches and parses data(regulations, directives and proposals) from Eurlex, the official website for European Union law. It extracts various parts of legal documents by their CELEX IDs and supports exporting the data in JSON and Pandas DataFrame formats.

Installation

pip install eurlex-parser

Usage

Functions

Examples

Following are some examples of how to use the functions to fetch and parse data from a CELEX ID. For example, the CELEX ID 32013R0575 corresponds to the following URL: https://eur-lex.europa.eu/legal-content/en/TXT/?uri=celex:32013R0575

  1. Fetch and print data for a given CELEX ID:

    from eurlex import get_data_by_celex_id
    
    data = get_data_by_celex_id('32013R0575')
    print(data)
  2. Save data as a JSON file:

    from eurlex import get_json_by_celex_id
    
    json_data = get_json_by_celex_id('32013R0575')
    with open('32013R0575.json', 'w', encoding='utf-8') as f:
        f.write(json_data)
  3. Load articles into a Pandas DataFrame:

    from eurlex import get_articles_by_celex_id
    
    df = get_articles_by_celex_id('32013R0575')
    print(df.head())
  4. Fetch and print summary for a given CELEX ID:

    from eurlex import get_summary_by_celex_id
    
    summary = get_summary_by_celex_id('32013R0575')
    print(summary)

You can find some generated JSON files in the examples directory.

Data Structure

The main data structure returned by get_data_by_celex_id is a dictionary with the following format:

{
  "title": "Document Title",
  "preamble": {
    "text": "Preamble text",
    "notes": [
      {
        "id": "1",
        "text": "Note text",
        "url": "https://eur-lex.europa.eu/...",
        "reference": null
      }
    ]
  },
  "articles": [
    {
      "id": "Article ID",
      "title": "Article Title",
      "text": "Article text",
      "metadata": {
        "parent_title1": "Parent Title 1",
        "parent_title2": "Parent Title 2",
      },
      "notes": [
        {
          "id": "1",
          "text": "Note text",
          "url": "https://eur-lex.europa.eu/...",
          "reference": null
        }
      ],
      "references": [
        "Directive ..../../..",
        "Regulation (EU) No .../....",
      ]
    }
  ],
  "notes": [
    {
      "id": "1",
      "text": "Note text",
      "url": "https://eur-lex.europa.eu/...",
      "reference": null
    }
  ],  
  "references": [
    "Directive ..../../..",
    "Regulation (EU) No .../....",
  ],
  "final_part": "Final part text",
  "annexes": [
    {
      "id": "Annex ID",
      "title": "Annex Title",
      "text": "Annex text",
      "table": "Markdown table text"
    }
  ],
  "summary": {
    "title": "Document Title",
    "chapters": {
      "Chapter Title 1": "Chapter content 1",
      "Chapter Title 2": "Chapter content 2"
    },
    "last_modified": "Last modified date"
  },
  "related_documents": {
    "modifies": [
      {
        "Relation": "Modifies",
        "Act": {
            "celex": "CELEX Number",
            "url": "https://eur-lex.europa.eu/..."
        },
        "Comment": "Addition",
        "Subdivision concerned": "Article number/paragraph",
        "From": "date",
        "To": "date"
      }
    ],
    "modified_by": [
      {
        "Relation": "Corrected by",
        "Act": {
            "celex": "CELEX Number",
            "url": "https://eur-lex.europa.eu/..."
        },
        "Comment": "",
        "Subdivision concerned": "Article number/paragraph",
        "From": "date",
        "To": "date"
      }
    ],
  }
}

Notes

License

This project is licensed under the MIT License.