in the parsing logic, if successive rows have the same member speaking, the rows are combined.
example: on 2024-05-07, patrick tay asked two questions consecutively. in the parsing logic, the two rows would be combined.
as a result, the accurate count of primary questions is not possible.
this PR:
ensures that if the rows are like a question (starts with "asked" or "to ask"), they will not be combined.
tries to relocate back some missing names - for example, in the format "The question stood in the name of", the name would be missing. tries to get back the name.
cleans \t out of the text.
cleans out html tags from text content (previously having tags in there. only content is kept).
context:
this PR:
\t
out of the text.date
to the raw table of speeches