ufal / ParCzech

ParCzech is a project on compiling Czech parliamentary data into annotated corpora.
https://ufal.mff.cuni.cz/parczech
0 stars 1 forks source link

change psp structure -> wrong sitting links #153

Closed matyaskopp closed 2 years ago

matyaskopp commented 2 years ago

Page with list of sittings (https://www.psp.cz/eknih/2017ps/stenprot/index.htm) changed structure - it does not contain links to sittings (.*schuz\/\d+-\d+.html) but only links to beginning of sitting (.*schuz\/s\d{6}.htm)

This does not match anything: https://github.com/ufal/ParCzech/blob/0b983136850e279d51c593c06c3eeb614576e6fe/src/downloader/stenoprotokoly_2013ps-now.pl#L240

and sittings are not added (because the link is added to the wrong array): https://github.com/ufal/ParCzech/blob/0b983136850e279d51c593c06c3eeb614576e6fe/src/downloader/stenoprotokoly_2013ps-now.pl#L244

The solution seems to be loading pages from $meeting_link: https://github.com/ufal/ParCzech/blob/0b983136850e279d51c593c06c3eeb614576e6fe/src/downloader/stenoprotokoly_2013ps-now.pl#L231 and use these links: image