swerik-project / riksdagen-records

0 stars 1 forks source link

Strange linebreaks in the speech #50

Open MansMeg opened 4 days ago

MansMeg commented 4 days ago

I was looking at this speech. It had really strange line breaks in the middle of the speech when using the swedeb interface.

{"id":"prot-1909--ak--024_026","gender":"Man","party":"S","year":1909,"speaker":"Karl Starbäck"}

Start of speech:

Herr Starbäck:
Herr talman! Jag hade verkligen icke tänkt yttra mig i den här frågan, ty jag hade trott, att det skulle bli ett meningsutbyte mellan juristerna här i kammaren om hvilken form för ändring af lagen eller hvilken form för. 

We should probably try to fix similar issues in multiple documents.

BobBorges commented 2 days ago

When reporting an error (in general, but especially from SWEDEB) we should get the ID attribute and/or line number of the xml element in question. It will make it easier to locate the problem example.

MansMeg commented 2 days ago

I agree! Ping @fredrik1984

BobBorges commented 1 day ago

Do we know how SWEDEB handles line breaks? It's possibly because of this:

image

It's no problem in terms of TEI, but it also makes the thing difficult to render nicely. I think one of @ninpnin has talked about how to join these in a reasonable way. (right?)

MansMeg commented 1 day ago

Yes. I think this would be the reason. Ie its an example of segmentation error.