maastrichtlawtech / extraction_libraries

Python libraries for extracting from data sources like Rechtspraak, ECHR, Cellar
Apache License 2.0
10 stars 1 forks source link

Python scripts for automation of web scraping operative part , citations and paragraph numbers from the eur-lex.europa.eu website for the purpose of path analysis . #5

Closed venvis closed 3 weeks ago

venvis commented 5 months ago

There are three python files :

1.operative_extractions.py- This is a python script with a class that takes a Celex id when initializing the class and returns the operative part , scraped from the web link of the respective Celex ID.The purpose of the different functions within the class is to overcome the different html page structures and syntax of different web pages , accordingly looping the respective page of the given celex id through each structure until it gets the operative part from the website. All the functions are of return type ->list

2.para.py-This python script extracts the Paragraph (P) numbers of the citations and returns them in a list only if they are found else it returns an empty list.

3.Citations.py-This python script extracts the citations(CELEX ID) and returns them in a list only if they are found else it returns an empty list.

shashankmc commented 3 weeks ago

This is implemented in cellar_extractor.