lindsaykatz / hansard-proj

Materials for the Digitization of the Australian Parliamentary Debates (1998-2022).
0 stars 1 forks source link

Error in Question Time tagging #6

Open danielcaseyAus opened 2 months ago

danielcaseyAus commented 2 months ago

many of the older CSV files don’t seem to correctly identify questions? Take 15 March 2000 as an example, I filtered the body to include the phrase “My question is to…”, and ask you can see, all of these come up with a zero in the question column? eg1

At some later point (see 19 June 2018) it is addressed: eg2

RohanAlexander commented 2 months ago

@danielcaseyAus - just wanted to say thanks very much for flagging this. @lindsaykatz and I are looking around to work out what's going on and develop a fix.

danielcaseyAus commented 2 months ago

I think I’ve identified what may be the issue with this dataset. I assume you are relying on the “context” variable to identify questions? Unfortunately This variable was not consistently used across time periods. It seems like prior to May 2011. The options within this variable have changed, and the "questions without notice" option only appeared from 2011. So, if you rely on that variable prior to that, it won't work.