iangow / se_core

Core code for StreetEvents data
7 stars 5 forks source link

Sort out `Transcript` begins the Q&A cases #16

Open Yvonne-Han opened 4 years ago

Yvonne-Han commented 4 years ago

I think 1000242_T.xml might be the only one we would want to address by changing the parsing code. We could in principle do this by changing the following line

https://github.com/iangow/se_core/blob/0f4c5b73eeaa941e1405b2787c9854eb047108d5/import_speaker_data.R#L103

but the problem is that this line

https://github.com/iangow/se_core/blob/0f4c5b73eeaa941e1405b2787c9854eb047108d5/import_speaker_data.R#L91

implies that I found cases where Transcript provided a demarcation between the XML preamble and the presentation part of the call. So we'd probably have some ugly code that says "if there are Presentation and Transcript, but no Q&A, then assume that Transcript begins the Q&A". It might make sense to build up a sample of files with ====\nTranscript in them to be able to test this before implementing. Perhaps make a new issue for that, but don't do it just yet.

_Originally posted by @iangow in https://github.com/iangow/se_core/issues/15#issuecomment-619048982_