hasadna / knesset-data-pipelines

Main repository for Open Knesset project - contains the knesset data scrapers and processing pipelines
https://oknesset.org/
MIT License
14 stars 26 forks source link

fix committee protocol scraping to adjust for new protocols format #170

Closed OriHoch closed 1 year ago

OriHoch commented 5 years ago

In recent months, Knesset improved the committee protocol format to include rich metadata, this change affected some of our existing protocol parsing code.

The fix can be done by copying and modifying one of the jupyter notebooks, see the README on how to use the Jupyter Lab server.

To investigate and visualize the problem you can use the following meeting:

https://oknesset.org/meetings/2/0/2078315.html

image

The speaker parts are not identified when they are wrapped with the formatting tags דובר / יור

The fix should be done as early in the pipelines as possible and children pipelines should be tested to make sure they are not affected.

OriHoch commented 1 year ago

this is fixed but there is a new problem, opened a new issue for it: #201