welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Protocol sections to \<div> #391

Closed BobBorges closed 10 months ago

BobBorges commented 10 months ago

Here I use existing code (scripts/split_into_sections.py) to divide up the unicameral period protocols into sections (based on the § character), delimited by \

elements.

I also sneak in a script (scripts/git-add_diff-sample.py) which should work in tandem with @ninpnin 's sample-git-diffs, in order to quickly git add the files that were sampled from the diff.

Sample for quality assessment to follow.

BobBorges commented 10 months ago

Sampled changes

corpus/protocols/1972/prot-1972--87.xml

Diff starting from line 8485

@@ -8459,6 +8485,8 @@
           <note xml:id="i-TXZbZyBcyHLKuS4xePmiYu">
             Punkterna C-E Kammaren biföll vad utskottet i dessa punkter hemställt.
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-TrSoYcWDyDqUdbqbHzSBf1">
             §&amp; 14 En analys av alkoholoch narkotikamissbrukets utveckling
             på längre sikt, m. m.
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/1973/prot-1973--139.xml

Diff starting from line 1519

@@ -1501,6 +1519,8 @@
               att utredningsresultatet etappvis ställs under offentlig debatt?
             </seg>
           </u>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-4PniShYu7qXhJyrEdCaTca">
             § 12 Kammaren åtskildes kl. 15.36. In fidem
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/1973/prot-1973--142.xml

Diff starting from line 7874

@@ -7824,6 +7874,8 @@
           <note xml:id="i-8Tj5erygqPSFiYsXLf3dk2">
             88
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-KYKKYDDMaFNjhAYyPKBC8K">
             § 26 Meddelande ang. enkla frågor
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/1973/prot-1973--92.xml

Diff starting from line 62

@@ -62,7 +62,7 @@
         </div>
       </front>
       <body>
-        <div>
+        <div type="commentSection">
           <pb facs="https://betalab.kb.se/prot-1973--92/prot_1973__92-000.jp2/_view"/>
           <note xml:id="i-NXg8ZER2Jey9QmrKtcQo58">
             Riksdagens protokoll
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/1974/prot-1974--116.xml

Diff starting from line 8059

@@ -8035,6 +8059,8 @@
           <note xml:id="i-NY2FwSyhACJE6KBvA4rCpM">
             Överläggningen var härmed slutad. Utskottets hemställan bifölls.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-NdcTEsAiB7ncDe43epvNH5">
             § 13 Föredrog justitieutskottets betänkande nr 25 i anledning
             av motioner angående straffsanktionerad tystnadsplikt för tolk.
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/1974/prot-1974--67.xml

Diff starting from line 6677

@@ -6659,6 +6677,8 @@
             Punkten 6 Utskottets hemställan bifölls.
           </note>
           <pb facs="https://betalab.kb.se/prot-1974--67/prot_1974__67-079.jp2/_view"/>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-454hatukVvxY976DTRv5tj">
             § 13 Krigsmaterielindustrin m. m.
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/1975/prot-1975--99.xml

Diff starting from line 1502

@@ -1494,6 +1502,8 @@
             Punkterna 2-5 Kammaren biföll vad utskottet i dessa punkter hemställt.
           </note>
           <pb facs="https://betalab.kb.se/prot-1975--99/prot_1975__99-018.jp2/_view"/>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-3AAVwaUK52B5MtJPSuUYoC">
             § 5 Föredrogs Nr 99
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/197576/prot-197576--138.xml

Diff starting from line 11389

@@ -11359,12 +11389,16 @@
           <note xml:id="i-KshmEUYZVqJvnqouza5jW7">
             Överläggningen var härmed slutad.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-A7baHPogjv4bSra7v7PAuj">
             § 16 Fru tredje vice talmannen meddelade att på föredragningslistan
             för kammarens nästkommande sammanträde skulle näringsutskottets
             betänkanden nr 73, 53 och 54 i nu angiven ordning uppföras främst
             bland två gånger bordlagda ärenden.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-F9zYVTGxwmMetRGdf8CaXf">
             § 17 Anmäldes och bordlades Utbildningsutskottets betänkande
             1975/76:33 med anledning av propositionen 1975/76:118 om hemspråksundervisning
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/197879/prot-197879--82.xml

Diff starting from line 64

@@ -64,7 +64,7 @@
         </div>
       </front>
       <body>
-        <div>
+        <div type="commentSection">
           <pb facs="https://betalab.kb.se/prot-197879--82/prot_197879__82-000.jp2/_view"/>
           <note xml:id="i-FtUFNmoGAkAzMpuPnHq2aP">
             Riksdagens protokoll
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/198384/prot-198384--102.xml

Diff starting from line 65

@@ -65,7 +65,7 @@
         </div>
       </front>
       <body>
-        <div>
+        <div type="commentSection">
           <pb facs="https://betalab.kb.se/prot-198384--102/prot_198384__102-000.jp2/_view"/>
           <note xml:id="i-MgQBEcPCv4MTGoso5TxYhz">
             Riksdagens protokoll
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/198384/prot-198384--132.xml

Diff starting from line 4692

@@ -4688,6 +4692,8 @@
           <note xml:id="i-DgaYQ2ShACZmQVMiiaxHAZ">
             Mom. 6 Utskottets hemställan bifölls.
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-63iN4txNokFDqzGjLBw7VX">
             6 § Avgifter inom äldreomsorgen
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/198586/prot-198586--104.xml

Diff starting from line 4141

@@ -4135,6 +4141,8 @@
           <note xml:id="i-qbYNbkSNfyxiMwer9xWq7">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-E2hSvAcUGjwHqVutakNsYu">
             23 § Svar på interpellation 1985/86:141 om tillämpningen av sekretesslagen
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/198788/prot-198788--64.xml

Diff starting from line 511

@@ -509,6 +511,8 @@
             Överläggningen var härmed avslutad.
           </note>
           <pb facs="https://betalab.kb.se/prot-198788--64/prot_198788__64-007.jp2/_view"/>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-UJA1GZLonjzrwd3z9u3Hm">
             3 §&amp; Svar på interpellation 1987/88:163 om företaget ScanDusts
             återvinning av miljöfarligt avfall
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/198990/prot-198990--18.xml

Diff starting from line 62

@@ -62,7 +62,7 @@
         </div>
       </front>
       <body>
-        <div>
+        <div type="commentSection">
           <pb facs="https://betalab.kb.se/prot-198990--18/prot_198990__18-000.jp2/_view"/>
           <note xml:id="i-CDgn3oB1eSuZTMhoRPowH1">
             Riksdagens protokoll ÅS 1989/90:18 ST
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/199091/prot-199091--115.xml

Diff starting from line 6619

@@ -6573,6 +6619,8 @@
           <note xml:id="i-WVUeXmFt4QvUStXmSZSpQA">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-GoCyWVye5JJ1gDHo83SeQz">
             24 § Svar på fråga 1990/91:640 om tullen på miljövänliga sjukvårdsprodukter
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/199192/prot-199192--28.xml

Diff starting from line 6025

@@ -5983,6 +6025,8 @@
           <note xml:id="i-2NuFg8D9LGpCWGVxzHmijZ">
             (Beslut fattades efter 37 §.)
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-9LgGJHhv6k1UbT6E4YATEA">
             23 § Internationella överenskommelser rörande räddningstjänst
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/199394/prot-199394--24.xml

Diff starting from line 446

@@ -442,6 +446,8 @@
           <note xml:id="i-RdS8yDoF5PhKuVPLdDg2DM">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-Wc5yg7pvWV7raEyZVmuSet">
             4 § Svar på frågorna 1993/94:22 och 146 om inrikes strider i
             Turkiet m.m.
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/199394/prot-199394--54.xml

Diff starting from line 6308

@@ -6292,6 +6308,8 @@
             Bil. 19 Beredskapsbudget för totalförsvarets civila del till
             försvarsutskottet
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-JfMJzGeyL52RyEpWVewhrn">
             10 § Hänvisning av ärenden till utskott
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/199394/prot-199394--71.xml

Diff starting from line 80

@@ -78,6 +80,8 @@
               i fredsbevarande insatser.
             </seg>
           </u>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-9c5GH7h1QUTMzkopcVnG45">
             3 § Hänvisning av ärenden till utskott
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/199495/prot-199495--35.xml

Diff starting from line 784

@@ -774,6 +784,8 @@
               tisdagen den 13 december.
             </seg>
           </u>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-TpCfAT74jG1bGpcBD9V3ny">
             10 § Kammaren åtskildes kl. 12.23.
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/199495/prot-199495--9.xml

Diff starting from line 97

@@ -93,6 +97,8 @@
           <note xml:id="i-3sb8so8qfHQbVM1ZeU5LzV">
             av riksdagens myndigheter och organ.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-3M7T114JTVRN4L7j6Xy8Qn">
             3 § Hänvisning av ärenden till utskott
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/199697/prot-199697--97.xml

Diff starting from line 4602

@@ -4578,6 +4602,8 @@
           <note xml:id="i-5GM28CHqsPRbeCZKaw95VH">
             1996/97:122 Finsk, rödmärkt olja i motordrivna fordon
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-PT6aKqrhZWEryqvF7zxFwL">
             13 § Anmälan om interpellationer
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/199899/prot-199899--30.xml

Diff starting from line 5572

@@ -5534,18 +5572,26 @@
           <note xml:id="i-77v63fKBq3TWoie2JsJTtp">
             Innehållsförteckning
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-FZH4jmg8HrvGQqtuXQYXMR">
             1 § Justering av protokoll 1
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-2MVHhE7vom7Fz2dwhFy1mX">
             2 § Meddelande om riksdagens planering för tiden januari 1999-
           </note>
           <note xml:id="i-V4MtwXTQRgrmhCY9D8M5J7">
             december 2000 1
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-8Rt7h6UgupVM8Fx97ywJzV">
             3 § Meddelande om plenum torsdagen den 10 december 1
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-RiQyjWFMq8sdvb1PMb336K">
             4 § Svar på interpellation 1998/99:26 om Stockholmsbörsen 1
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/199899/prot-199899--42.xml

Diff starting from line 5188

@@ -5184,6 +5188,8 @@
           <note xml:id="i-QGbH1GGPyV2LMqoyGmah57">
             Rossana Valeria (v)
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-2LWZo4JJ68TZSNsn7t9Smb">
             3 § Anmälan om inkomna faktapromemorior om förslag från Europeiska
             kommissionen
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/199899/prot-199899--5.xml

Diff starting from line 2186

@@ -2168,6 +2186,8 @@
           <note xml:id="i-GD4UYynoQsUrLPFERFLDJm">
             Skrivelsen lades till handlingarna.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-Nj2Ag2FmDaQQYLaoM9z8KW">
             10 § Meddelande om frågestund
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/19992000/prot-19992000--126.xml

Diff starting from line 1329

@@ -1313,6 +1329,8 @@
           <note xml:id="i-QLFoDUZCGHzxLzZTo6yA7D">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-NtLxsEiMFT7upi3QKXLYjX">
             9 § Ett informationssamhälle för alla
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/200102/prot-200102--110.xml

Diff starting from line 195

@@ -183,6 +195,8 @@
           <note xml:id="i-XvZrU37eSL3uARr1MHuhAj">
             KrU22
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-VMV9JHxBRURkhKBzZFeuzd">
             7 § Beslut om ärende som slutdebatterats den
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/200304/prot-200304--42.xml

Diff starting from line 10604

@@ -10586,6 +10604,8 @@
           <note xml:id="i-CK7d1SSDAJPhT3q3E3DrpJ">
             Övriga punkter
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-VA98Bqns7M2dpe1vDKTFDp">
             10 § (forts. från 8 §) Invandrare och flyktingar (forts. SfU2)
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/200506/prot-200506--106.xml

Diff starting from line 1847

@@ -1831,6 +1847,8 @@
           <note xml:id="i-4rnaanCb97RXhUQyKaA7H1">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-THDsX2ztYwF2P8DDZDDfuh">
             9 § Svar på interpellation 2005/06:330 om översyn av reglerna
             för barnbidrag och studiehjälp
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/200506/prot-200506--114.xml

Diff starting from line 4562

@@ -4528,6 +4562,8 @@
               tisdagen den 9 maj.
             </seg>
           </u>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-Hz3fUKhiyBPHGQEBzrExh5">
             18 § Anmälan om skriftligt svar på fråga
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/200607/prot-200607--79.xml

Diff starting from line 112

@@ -108,6 +112,8 @@
           <note xml:id="i-XZqAWzNmNjaWXCPmRpTbYt">
             /Kerstin Siverby
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-Y8Gu8CXRGciFpM4JDZuc2j">
             3 § Meddelande om inställd votering
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/200708/prot-200708--9.xml

Diff starting from line 94

@@ -90,6 +94,8 @@
           <note xml:id="i-8DpeZft1BEoawuCAN7D9Wf">
             Kammaren medgav denna utökning.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-TNn5Ah4NAkTwdaUdFPbNrM">
             3 § Val av extra suppleant i konstitutionsutskottet
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/200809/prot-200809--101.xml

Diff starting from line 6718

@@ -6692,6 +6718,8 @@
           <note xml:id="i-V2SfTS8tpFd5ppgjXijiR9">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-U1EwUTURNNvH7y45Q9iT8x">
             14 § Bordläggning och beslut om förlängd motionstid
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201011/prot-201011--127.xml

Diff starting from line 1881

@@ -1861,6 +1881,8 @@
               torsdagen den 15 september.
             </seg>
           </u>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-7tsMVJHpumMWFwEECsFKqe">
             11 § Kammaren åtskildes kl. 14.08.
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201011/prot-201011--5.xml

Diff starting from line 382

@@ -348,6 +382,8 @@
             från valet till dess nytt val förrättats under början av nästa
             valperiod:
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-GqbUmgnrDRkBfMQQob3JdE">
             18 § Justering av protokoll
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201112/prot-201112--126.xml

Diff starting from line 3394

@@ -3374,6 +3394,8 @@
           <note xml:id="i-TruLXrkpAjvN1Nhy2yeGkG">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-6TV9bpvzrXGNSepzubA7ZE">
             11 § Bordläggning och beslut om förlängd motionstid
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201112/prot-201112--16.xml

Diff starting from line 1941

@@ -1927,6 +1941,8 @@
             för tisdagen den 11 oktober i ärende om subsidiaritetsprövning
             av EU-förslag inkommit från konstitutionsutskottet.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-YDQoWL6d7NKrJ5rg8sjvFv">
             8 § Hänvisning av ärenden till utskott
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201213/prot-201213--73.xml

Diff starting from line 18943

@@ -18905,6 +18943,8 @@
               tisdagen den 12 mars.
             </seg>
           </u>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-XtKUGkp74kP8RBcYSahffk">
             20 § Kammaren åtskildes kl. 22.59.
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201314/prot-201314--21.xml

Diff starting from line 2642

@@ -2624,6 +2642,8 @@
           <note xml:id="i-AXYYTzXfNMCvujrFGEueDp">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-T6fMG6cQmMwwb9o1XFdRH3">
             10 § Svar på interpellation 2013/14:51 om Ojnare myr som Natura
             2000-område
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201314/prot-201314--90.xml

Diff starting from line 8232

@@ -8214,6 +8232,8 @@
           <note xml:id="i-M3PszUedTLupJPf5jgYYeL">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-LnuowqQBvHSGBQZGYjgvR1">
             10 § Planering och byggande
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201415/prot-201415--57.xml

Diff starting from line 11625

@@ -11551,6 +11625,8 @@
           <note xml:id="i-KDJnyQvxyHwsbgvDYewVAC" type="speaker">
             Anf. 132 Arbetsmarknadsminister YLVA JOHANSSON (S)
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-SdW1vRMLhqsR6LbuSBVNqE">
             § 15 Svar på interpellation 2014/15:236 om anställningar med
             statligt stöd
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201516/prot-201516--86.xml

Diff starting from line 88

@@ -84,6 +88,8 @@
           <note xml:id="i-PafaChyHGFaJCZxV62F2ZH">
             Ka mmaren biföll dessa avsägelser.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-BCKDyCs7BwPoFTbtuRUaoi">
             § 3 Anmälan om kompletteringsval
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201617/prot-201617--42.xml

Diff starting from line 10092

@@ -10072,6 +10092,8 @@
           <note xml:id="i-V87AuSteRBchV2vbZnfpxq">
             ( Beslut skulle fattas den 13 december.)
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-XCMuFvvQWBeYwxnkKGePxF">
             § 11 Anmälan om interpellationer
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201617/prot-201617--53.xml

Diff starting from line 59

@@ -59,14 +59,18 @@
         </div>
       </front>
       <body>
-        <div>
+        <div type="commentSection">
           <pb facs="http://data.riksdagen.se/fil/2EC58E71-C88A-4863-AA48-572F4902F9D7#page=1"/>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-DjTcrc63BC1asBTT4SFwc7">
             § 1 Justering av protokoll
           </note>
           <note xml:id="i-2MEB6N2edMuVjywn4XCqEC">
             Protokollen för den 12, 13 och 14 december justerades.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-V8wshu45zUvYTxr7eJqx8e">
             § 2 Anmälan om fördröjda svar på interpellationer
           </note>
  • [ ] Correct
  • [x] Incorrect

corpus/protocols/201617/prot-201617--6.xml

Diff starting from line 322

@@ -310,6 +322,8 @@
           <note xml:id="i-B1j1NXfG72WTjSQQSLK6fd">
             till statsrådet Sven-Erik Bucht (S)
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-KBqKXFG9HCdqvTxypRvY6v">
             § 7 Kammaren åtskildes kl. 9.01.
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201718/prot-201718--112.xml

Diff starting from line 11597

@@ -11491,18 +11597,28 @@
             AU17 Subsidiaritetsprövning av kommissionens förslag till förordning
             om inrättande av Europeiska arbetsmyndigheten
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-41TDvVGZxjwvGaVHoo9839">
             § 25 Bordläggning
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-BnqaRUGqGxVT8FBC553ZNa">
             § 26 Anmälan om interpellationer
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-826fGeKJmqNNWiUf53RAAV">
             § 27 Anmälan om frågor för skriftliga svar
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-LVrNJoSzzNpqf5P64NiAb7">
             § 28 Anmälan om skriftliga svar på frågor
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-6BYrw6SXrg9qnDwPqmPt5N">
             § 29 Kammaren åtskildes kl. 16.20.
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201819/prot-201819--12.xml

Diff starting from line 145

@@ -131,6 +145,8 @@
             2018/19:FPM9 Gemensamt meddelande om förbindelserna mellan Europa
             och Asien – byggstenar för en EU-strategi JOIN(2018) 31 till trafikutskottet
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-GFTtu2wnzSNkbP7VPWxUYa">
             § 8 Anmälan om granskningsrapport
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201819/prot-201819--37.xml

Diff starting from line 251

@@ -247,6 +251,8 @@
           <note xml:id="i-3u8SvEd7bd5NDGdMTRCbV5">
             Kammaren biföll dessa avsägelser.
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-SzR2yGcjvnoz3nuM5mgSco">
             § 3 Anmälan om kompletteringsval
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/201819/prot-201819--81.xml

Diff starting from line 11653

@@ -11619,12 +11653,18 @@
           <note xml:id="i-XuSFTRvW3vfo8j7MvCYNHn">
             Innehållsförteckning
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-X6eQiQ96aFBcwZgntBEyxB">
             § 1 Justering av protokoll
           </note>
+        </div>
+        <div type="commentSection">
           <note xml:id="i-AGiVMadh8btydpYG9mtVhc">
             § 2 Anmälan om fördröjt svar på interpellation
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-UEWe6Kz5QcnuZBuX7UH2ye">
             § 3 Interparlamentariska unionen (IPU)
           </note>
  • [ ] Correct
  • [ ] Incorrect

corpus/protocols/202122/prot-202122--70.xml

Diff starting from line 7313

@@ -7241,6 +7313,8 @@
           <note xml:id="i-9yzgVCyLV5Md3um8MSVQjW">
             (Beslut skulle fattas den 23 februari.)
           </note>
+        </div>
+        <div type="debateSection">
           <note xml:id="i-K3Fid1jwQi92pU28ysPUev">
             § 12 Svar på interpellation 2021/22:321 om inrättandet av en
             kriskommission om LSS
  • [ ] Correct
  • [ ] Incorrect
MansMeg commented 10 months ago

The unit tests are failing?

BobBorges commented 10 months ago

its the schema test. some of the 202122 protocols are empty. I found it just before i went home, so not really sure what the cause of that is yet.

MansMeg commented 10 months ago

Seems like it captures page divs: corpus/protocols/201617/prot-201617--53.xml This should be easy to fix, I think.

MansMeg commented 10 months ago

Also commentSection does not really make sense semantically. I would go with debateSection and otherSection for now.

edit: I saw this is the standard in the parlamint. But it hurts my eyes. So i would create our own sections here anyway. Simply because I think we will want to have a more elaborate sectioning further down the lines.

MansMeg commented 10 months ago

Also. ParlaMint states that the first note after should be a header, so maybe add that as well?

ninpnin commented 10 months ago

ParlaMint is the more restrictive version of the two, a strict subset of ParlaClarin. I think we should use it as a suggestion.

In practice: sometimes the header is not available in our data, so I think we shouldn't put too much effort into following that rule.

BobBorges commented 10 months ago

I think we should decide on a preliminary idea of how to adjust the divs now and I can implement it before we commit changes to the whole unicameral period. My thoughts:

  • first \
    element under \ should probably not be tagged as a debate section
  • debate section divs have a type attrib with debate_ as a general value, and we can specify further as we go, e.g., debate_interpellationDebate and debate_interpellationQuestion
  • commentSection should probably be other or something generic for the time being to signal !debate
BobBorges commented 10 months ago

I just talked it over with @ninpnin -- we'll leave the commentSection/debateSection for now. It's easy enough to change later. Parlaclarin, specifies a subtype attribute, so that solves my main issue about classifying types of debates.

I see one check mark on an incorrect \

-- who should check the rest so we can get on with this?

MansMeg commented 10 months ago

Fair enough. Long term we probably want this information in tables anyways. Hence we should add IDs to the div tags just as we have for the notes and utterances.

i suggest we just use uuid there as well.

BobBorges commented 10 months ago

That's reasonable -- do you want to check the divs are correct enough first? I think it's a short script to add an id to the div tags -- we have a uuid generator function in the pyriksdagen module.

BobBorges commented 10 months ago

the unit test fails because of a couple protocols in 2021/22 with no body. They're on the riksdag open data, will fix this in a separate PR.

MansMeg commented 10 months ago

When I have been thinking a little longer. If we would remove type from the tags later, this would mean that we actually change the API. So we should try to avoid it and fix this right away. I also think MetaSolution was quite clear that the data should just include IDs to simplify linking and adding metadata.

Hence, we should do this right away. I dont think its much work. This would mean:

  1. Create a csv-file (called record_divisors.csv?) with column div_id and type. Im not sure in what folder we should store this.
  2. Add id to all div
  3. Move the ”type” attribute to the csv-file

Does this make sense?

ninpnin commented 10 months ago

I think this is a fundamentally different approach than what we have done so far.

So far, we have had a lot of annotations in the XML files. That's what ParlaClarin is for. Otherwise we would use tabular data, eg. CSVs for text too.

My current gut feeling is that our current approach works better with git.

Either way, I don't think we should add a new CSV now. Either we continue with our current approach, or change to a tabular structure later after more planning.

MansMeg commented 10 months ago

That is true. I think we get some conflicting best practices here. ParlaClarin as a format and MetaSolutions recommendations re using ids and linked data.

I agree with metasolutions long term, but you are right. Lets keep this as small as possible. Although we need to add id to all elements anyway since we gonna need to take samples of sections.

MansMeg commented 10 months ago

the unit test fails because of a couple protocols in 2021/22 with no body. They're on the riksdag open data, will fix this in a separate PR.

Im hesitant to merge a PR that doesnt pass the tests. So we should then try to fix that assp.

BobBorges commented 10 months ago

Here comes a new sample with id atribs in the div and 'empty' protocols in the 202122 year curated. Lets hope the unit tests pass :D

BobBorges commented 10 months ago

Sampled changes

corpus/protocols/1972/prot-1972--24.xml

Diff starting from line 3172

@@ -3150,6 +3172,8 @@
           <note xml:id="i-HHDhpAANZJrmYUDqzPhmCK">
             Denna anhållan bordlades.
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-X33R3qea3RbqeNGwrLd1mh">
           <note xml:id="i-KzgEJ9DWhqzRBZbWpYnuwj">
             § 12 Anmäldes och bordlades Kungl. Maj:ts propositioner:
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/1973/prot-1973--120.xml

Diff starting from line 65

@@ -65,7 +65,7 @@
         </div>
       </front>
       <body>
-        <div>
+        <div type="commentSection" xml:id="i-7Hg1De5Po567941hoEN5Eb">
           <pb facs="https://betalab.kb.se/prot-1973--120/prot_1973__120-000.jp2/_view"/>
           <note xml:id="i-6MnUiXYLfspTq7tqvktbyu">
             Riksdagens protokoll
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/197879/prot-197879--79.xml

Diff starting from line 62

@@ -62,7 +62,7 @@
         </div>
       </front>
       <body>
-        <div>
+        <div type="commentSection" xml:id="i-WhnS8hWbaziWUxiyVsjRu">
           <pb facs="https://betalab.kb.se/prot-197879--79/prot_197879__79-000.jp2/_view"/>
           <note xml:id="i-PGiGmFUjFqxeozFogDjSPY">
             Riksdagens protokoll
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/197879/prot-197879--90.xml

Diff starting from line 3430

@@ -3400,18 +3430,26 @@
             betänkande 1978/79:14 Jordbruksutskottets betänkande 1978/79:17
             Näringsutskottets betänkanden 1978/79:19-21
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-GDV2NxHAuiYua1uC4LgXNa">
           <note xml:id="i-BUGefzQkzxYYkaXkrsZx3X">
             § 19 Föredrogs och bifölls Interpellationsframställning 1978/79:149
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-QYro5141WoAx8z7uYXVSpa">
           <note xml:id="i-WQtoWEzf5SCNFmFSB7tkmg">
             § 20 Talmannen meddelade att på föredragningslistan för morgondagens
             sammanträde skulle finansutskottets betänkande nr 20 och skatteutskottets
             betänkande nr 29 uppföras främst bland två gånger bordlagda ärenden.
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-NbesPokTdQ24SVVR7fSbwU">
           <note xml:id="i-UoJDppyqfyMhSS23nZhNgg">
             § 21 Anmäldes och bordlades Proposition 1978/79:89 om lokalhyra
           </note>
           <pb facs="https://betalab.kb.se/prot-197879--90/prot_197879__90-041.jp2/_view"/>
+        </div>
+        <div type="debateSection" xml:id="i-G7rCj1kjqCHQDawYn99ejp">
           <note xml:id="i-N97S5BHKP6kAXoaYbEo9ea">
             § 22 Anmälan av interpellation
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/197980/prot-197980--41.xml

Diff starting from line 5419

@@ -5389,6 +5419,8 @@
               av Lysekilsbanan kan genomföras utan dröjsmål?
             </seg>
           </u>
+        </div>
+        <div type="commentSection" xml:id="i-V3hXj8qo6S2Pzzt3L2yZpA">
           <note xml:id="i-9YeCE5sNnJRu34tVFknkgb">
             § 17 Kammaren åtskildes kl. 15.01. In fidem
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/197980/prot-197980--56.xml

Diff starting from line 8043

@@ -8023,6 +8043,8 @@
           <note xml:id="i-K4E5su54KdtTu5pacFps41">
             Mom. 2-7 Kammaren biföll vad utskottet i dessa moment hemställt.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-UNjYVpvV3BHw9yDy8yT9Dg">
           <note xml:id="i-8EHo2t5WvyzppWZyb8gyLb">
             § 12 Invandrarundervisning m. m.
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/198182/prot-198182--31.xml

Diff starting from line 64

@@ -64,7 +64,7 @@
         </div>
       </front>
       <body>
-        <div>
+        <div type="commentSection" xml:id="i-YKJQ9t6vq2g91Se14ztFjn">
           <pb facs="https://betalab.kb.se/prot-198182--31/prot_198182__31-000.jp2/_view"/>
           <note xml:id="i-5fKzSMmN1WerpNSyAhwcXV">
             Riksdagens protokoll
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/198283/prot-198283--111.xml

Diff starting from line 353

@@ -353,6 +353,8 @@
           <note xml:id="i-PnUNJn84bxmRD9K6GUAbZf">
             suppleant i utbildningsutskottet Sonia Thomasson (vpk)
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-S1pessFPXLM6rWYzEH1QjS">
           <note xml:id="i-PBCqXtv8qLCcE4P18gCuVF">
             3§ Talmannen meddelade att Ingemar Konradsson (s) denna dag återtagit
             sin plats i riksdagen, varigenom Ulla-Britt Carlssons uppdrag
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/198384/prot-198384--100.xml

Diff starting from line 3183

@@ -3175,15 +3183,21 @@
           <note xml:id="i-9YZ5JzboDwZ4z63771xjK3">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-BsCQM62oikNZ3ioKPCXuVk">
           <note xml:id="i-DbjQrUu8GsGVNnAbuZjLbi">
             11 § På förslag av talmannen beslöt kammaren kl. 11.10 att ajournera
             sina förhandlingar till kl. 14.00, då de till dagens bordläggning
             anmälda utskottsbetänkandena väntades föreligga.
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-BdgMumfDWQE9NA242JctvF">
           <note xml:id="i-BGwYLbyW36NMRKdpzngTAC">
             12 § Förhandlingarna återupptogs kl. 14.00 under ledning av förste
             vice talmannen.
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-NDrQvVL2ZQjE8AcX4cttKZ">
           <note xml:id="i-7upkPfaSsBkcRFxuFV6S8a">
             13 § Anmäldes och bordlades Proposition 1983/84:128 Förslag till
             lag om företagshypotek m. m.
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/198384/prot-198384--155.xml

Diff starting from line 3523

@@ -3519,6 +3523,8 @@
           <note xml:id="i-FSkMitL3nfGSkXpQC11GRs">
             Övriga moment Utskottets hemställan bifölls.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-CFNt3WaMCeUbAjPuAUL5o2">
           <note xml:id="i-A4tx6KGBJRvjLq9Hk5NkQ9">
             5 §&amp; Arbetsmiljöfrågor, m. m.
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/198586/prot-198586--110.xml

Diff starting from line 819

@@ -817,6 +819,8 @@
           <note xml:id="i-HVqcDkZHt748vre4KMfoYj">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-Vgx1LYaDsrZqDRSozjEjtA">
           <note xml:id="i-Vsy74coahMz5bjoq4eZm48">
             3 § Svar på interpellation 1985/86:146 om åtgärder mot radioaktiva
             utsläpp från engelsk upparbetningsanläggning
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/198687/prot-198687--73.xml

Diff starting from line 366

@@ -366,6 +366,8 @@
           <note xml:id="i-7TqiRkjb2yAuoPYePo7jTv">
             18 Justerades protokollet för den 9 innevarande månad.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-Cc9TJgYWfzZsaGybgG1n5W">
           <note xml:id="i-AmpTwAcF3WPs17Dhk2iYVZ">
             2 § Svar på interpellation 1986/87:96 om åtgärder för att förenkla
             och effektivisera socialförsäkringen
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/199091/prot-199091--78.xml

Diff starting from line 59

@@ -59,13 +59,15 @@
         </div>
       </front>
       <body>
-        <div>
+        <div type="commentSection" xml:id="i-Fkv3bwf9PRu2aakA4TSb2n">
           <note xml:id="i-H8Se86iLdeznpxoeEpkSnk">
             1 § Justering av protokoll
           </note>
           <note xml:id="i-Ab5VjJsL9Tzm9HS2Cmrk8y">
             Justerades protokollet för den 8 mars.
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-R3U6NZvuQwJefao1SVMMji">
           <note xml:id="i-rTHHpwPubrfwA13NDtzyZ">
             2 § Bordläggning
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/199192/prot-199192--121.xml

Diff starting from line 10376

@@ -10328,6 +10376,8 @@
             Kammaren beslöt att ärendebehandlingen skulle fortsättas vid
             arbetsplenum måndagen den 1 juni.
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-WhuAoTJaCdoaqvPzkqS64Z">
           <note xml:id="i-QhYGThrMHD1PiKTY1Rfbi7">
             26 § Bordläggning
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/199293/prot-199293--71.xml

Diff starting from line 6983

@@ -6939,6 +6983,8 @@
           <note xml:id="i-23akbUZ546t1WCuf495VAR">
             1992/93:AU7, AU9 och AU15
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-4QcG6NWqhcZT5x5JbpkprZ">
           <note xml:id="i-AFZCMQVmoGJBNbdzg9Exyu">
             24 § Bordläggning
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/199394/prot-199394--124.xml

Diff starting from line 10470

@@ -10456,6 +10470,8 @@
           <note xml:id="i-6PvWsiN7QtC2TzTu8BVf8V">
             Förhandlingarna återupptogs kl. 15.00.
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-34i1ky7Vu8UVS6qhcYpwRQ">
           <note xml:id="i-Xf9w3uZ9NXgfNKVLbXEFHp">
             9 § Avsägelse
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/199495/prot-199495--40.xml

Diff starting from line 518

@@ -504,6 +518,8 @@
           <note xml:id="i-Y7rzZ3AgF24sFWMunRbJqS">
             (Beslut skulle fattas den 14 december.)
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-LMooFNh7qWfKgy2ZXrbKj3">
           <note xml:id="i-EW8VcGYMQSdCvE1iB7TWzZ">
             8 § Oskäliga avtalsvillkor m.m.
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/199495/prot-199495--76.xml

Diff starting from line 69

@@ -69,6 +69,8 @@
               ________________________________________________________________________
             </seg>
           </u>
+        </div>
+        <div type="commentSection" xml:id="i-3guoHd4Xv1BTuGxzwzcJK8">
           <note xml:id="i-DtVyPH8Joqh7Nki1jYXfPp">
             1 § Avsägelse
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/199899/prot-199899--17.xml

Diff starting from line 4845

@@ -4825,6 +4845,8 @@
             Interpellationerna redovisas i bilaga som fogas till riksdagens
             snabbprotokoll tisdagen den 24 november.
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-DhjpCNZzuFzXNSkuENgUJ3">
           <note xml:id="i-TJo1UeQVntvrUtBpFVTXJZ">
             11 § Anmälan om fråga för skriftligt svar
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/199899/prot-199899--38.xml

Diff starting from line 336

@@ -320,6 +336,8 @@
             AU1 samt näringsutskottets betänkanden NU1, NU2 och NU3 skulle
             avgöras i ett sammanhang efter avslutad debatt.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-3F8a7BPWWUveovCAFDHV9G">
           <note xml:id="i-VURidbF1UszbSTjSQCsGf6">
             9 § Ekonomisk trygghet vid arbetslöshet samt arbetsmarknad och
             arbetsliv
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/19992000/prot-19992000--112.xml

Diff starting from line 3395

@@ -3379,6 +3395,8 @@
           <note xml:id="i-9u95apYeYffGQ4b6dy4Tx2">
             (Beslut fattades under 11 §.)
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-ETQD59HnRUTg3rFxjeuGda">
           <note xml:id="i-PWc4tbZvVhN9ySwwtvfgtV">
             9 § Tillträde till internationella instrument mot penningförfalskning
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200001/prot-200001--35.xml

Diff starting from line 188

@@ -180,6 +188,8 @@
           <note xml:id="i-NXmeydYpGAxgtrgtxNxWGo">
             Ingegerd Wärnersson
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-kajQ6vJPDaaWnAuVDJ2mF">
           <note xml:id="i-QWTK52XQPGju38N82HDtuS">
             5 § Svar på interpellation 2000/01:97 om verksamheten vid Lunds
             universitets historiska museum
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200001/prot-200001--56.xml

Diff starting from line 15540

@@ -15490,6 +15540,8 @@
           <note xml:id="i-R8atM2aknihnLFwqQbxcyn">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-KKR4LSNNrvaH2uD441gxzR">
           <note xml:id="i-QeJXhHMkjXxLrtGf9BSpCd">
             26 § Svar på interpellation 2000/01:188 om tomträtter
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200001/prot-200001--64.xml

Diff starting from line 59

@@ -59,7 +59,7 @@
         </div>
       </front>
       <body>
-        <div>
+        <div type="commentSection" xml:id="i-Up1rjAzqTpCVu3qiGxjv5w">
           <pb facs="http://data.riksdagen.se/fil/EAEC16F1-80A8-4F8B-AAC0-1C8AE4993D01#page=1"/>
           <note xml:id="i-T2gmCNM3HbFu4pDN2mUgFP">
             Det justerade protokollet beräknas utkomma om 3 veckor
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200102/prot-200102--65.xml

Diff starting from line 70

@@ -70,12 +70,16 @@
           <note xml:id="i-K8LC1ZsLfRNnC4XHiXkvKP">
             -------------------------------------------------------------------
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-WWSVeFS8UUeZxUawjjzX4z">
           <note xml:id="i-PHZWq5QuLYvuQUWAARAvJ5">
             1 § Justering av protokoll
           </note>
           <note xml:id="i-YLarRBtfkqisttE3Xm3QLh">
             Justerades protokollet för den 1 februari.
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-XggoQqtkUGi3Mb2DdxJX5T">
           <note xml:id="i-TtPsCZTzEyDsSRKGgV28nc">
             2 § Meddelande om utrikespolitisk debatt
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200102/prot-200102--79.xml

Diff starting from line 4816

@@ -4798,6 +4816,8 @@
           <note xml:id="i-D2WnFNisRbsJ1fv78yRRpb">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-LVVXvDFV784HLHitman5w1">
           <note xml:id="i-721gyy9MBKdF8MueZJeuyS">
             10 § Svar på interpellation 2001/02:243 om
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200304/prot-200304--25.xml

Diff starting from line 9060

@@ -9018,6 +9060,8 @@
               tisdagen den 18 november.
             </seg>
           </u>
+        </div>
+        <div type="commentSection" xml:id="i-LDfA3KtpJomEeCCu9ZhQ6u">
           <note xml:id="i-4amEiRs2H6QD57J1pBz3tV">
             22 § Kammaren åtskildes kl. 21.51.
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200405/prot-200405--101.xml

Diff starting from line 1840

@@ -1828,6 +1840,8 @@
           <note xml:id="i-WfUyr6bDix8pQydRBvYZnc">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-AfTQ1twiztLp9zz5FWX2xC">
           <note xml:id="i-3rc7DmGdCMBNkKsQzR6Moo">
             7 § Kommunal demokrati och kompetens
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200405/prot-200405--49.xml

Diff starting from line 12097

@@ -12081,6 +12097,8 @@
           <note xml:id="i-WPQNZ4ZexSQjgLEEmFjrKz">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-2wFWqZkSPkaRicGQUS4QkA">
           <note xml:id="i-7Exh5dsrheozxMigKJXvWH">
             9 § Jord- och skogsbruk, fiske med anslutande näringar
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200607/prot-200607--105.xml

Diff starting from line 6734

@@ -6704,6 +6734,8 @@
               tisdagen den 15 maj.
             </seg>
           </u>
+        </div>
+        <div type="commentSection" xml:id="i-7XoJcB6wGnkbkZnvzWbZKS">
           <note xml:id="i-E7Md5uEf3X81TWwqgJcmDs">
             16 § Kammaren åtskildes kl. 13.37.
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200607/prot-200607--111.xml

Diff starting from line 8544

@@ -8514,6 +8544,8 @@
           <note xml:id="i-DJApmBGuyE7ZCZvgJpKzXq">
             Förste vice talmannen konstaterade att ingen talare var anmäld.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-AcokcFEjeByJPJ5v8MqBM1">
           <note xml:id="i-WWx9WS9xnkDb7MA6Usg6Jq">
             16 § Avskaffande av åldersgräns
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200708/prot-200708--112.xml

Diff starting from line 11085

@@ -11061,6 +11085,8 @@
           <note xml:id="i-5MtwaUpEnbdqsHgpMgfiWA">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-5yLTXSMvfTtW3QKsgJ2xE9">
           <note xml:id="i-Pc8RzFZwcmC7gf6GuvGYTy">
             13 § Ny instansordning för arbetsmiljöärenden
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200708/prot-200708--138.xml

Diff starting from line 538

@@ -530,6 +538,8 @@
           <note xml:id="i-EBL7QFRPax7LF4C5dLpdD2">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-27dDYET3WdPSjHCEj5wq8W">
           <note xml:id="i-Vny6bvLYnMyxjjBur38FNC">
             5 § Svar på interpellation 2007/08:837 om kommunernas ekonomi
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200809/prot-200809--46.xml

Diff starting from line 561

@@ -545,6 +561,8 @@
           <note xml:id="i-PizfC89y4Rb3dBCzerUGoH">
             Punkterna 37
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-3cJLnXtTKXxgthXAvtP9et">
           <note xml:id="i-BG5doLJXVjx69Mq5s9PmTf">
             9 § Beslut om ärenden som slutdebatterats den 8 december
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200910/prot-200910--11.xml

Diff starting from line 239

@@ -225,6 +239,8 @@
           <note xml:id="i-EYKHweT77ozmbLM9zB1ZRr">
             Anmäldes och bordlades
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-Qe4pDnCh4V6Et9tfDMbjHt">
           <note xml:id="i-HrBXgaTLy5sdWanYYzqpmJ">
             8 § Anmälan om interpellationer
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/200910/prot-200910--145.xml

Diff starting from line 14647

@@ -14601,6 +14647,8 @@
           <note xml:id="i-Ef7v3QnAjmwK3YPfyo8VjZ">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-8oP5aKFZKeWhhwcqaG6ZsT">
           <note xml:id="i-Brxy4sUCXTSQckPAnmZZSK">
             24 § Svar på interpellation 2009/10:451 om en allmän och solidarisk
             a-kassa
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/201213/prot-201213--110.xml

Diff starting from line 2565

@@ -2547,6 +2565,8 @@
           <note xml:id="i-2dFGfFENRyQ6MWqrsy7X91">
             Förhandlingarna återupptogs kl. 14.00.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-YSpr6xrU1VxqY1v3b4WudA">
           <note xml:id="i-WypqQ9kBen4z8vFrur8ntj">
             10 § Statsministerns frågestund
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/201314/prot-201314--106.xml

Diff starting from line 9972

@@ -9936,6 +9972,8 @@
           <note xml:id="i-3ZF5gNg6DqSkKLoAZJxXWq">
             Hans Hoff
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-NtkHTW2aNp1shs4gWNa1Xe">
           <note xml:id="i-RZmLCJhytDFJw8X2PxTMN">
             19 § Anmälan om skriftliga svar på frågor
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/201314/prot-201314--92.xml

Diff starting from line 1773

@@ -1759,6 +1773,8 @@
           <note xml:id="i-3hzU8zPGcz1a6wyV3CXm4U">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-9xbf4UDfdTmJGeWN1QcHwN">
           <note xml:id="i-A8vb8aPdq5xaRWjGShVR8f">
             8 § Svar på interpellation 2013/14:265 om nedsättningen av arbetsgivaravgiften
             för unga
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/201415/prot-201415--121.xml

Diff starting from line 6277

@@ -6221,51 +6277,83 @@
           <note xml:id="i-HYL11c4ALwi7G75cDWejwC">
             Innehållsförteckning
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-LF6ocLKPvDi2aGX9xim9m1">
           <note xml:id="i-BENKRLk9gwSs8AfMNGX1FZ">
             § 1 Justering av protokoll
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-C3AeoTJtidce8KFFhihRE6">
           <note xml:id="i-CyvwfEQE78ZviPMQBDnUJy">
             § 2 Anmälan om interpellationer
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-H7WqmxQ1Rx4Rue67FjRGFB">
           <note xml:id="i-qwqsUrvwJ43QJtE3tGHZ7">
             § 3 Anmälan om skriftliga frågor och svar
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-17NvSH25pNM9mtfwF6wPyE">
           <note xml:id="i-5L8gCSf1Xvu1Vb1oByo2Jz">
             § 4 Anmälan om ny riksdagsledamot
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-Cuhqodpmrr5niztLQwofHm">
           <note xml:id="i-KHjec4Capk2BqQs3yxgvfi">
             § 5 Anmälan om återtagande av plats i riksdagen
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-Ft99LcFwpf2HuKzgG5fFDu">
           <note xml:id="i-22p2sBTuZ4Hd9TLRZktw1t">
             § 6 Avsägelser
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-TKV2MKo6JrMzT3tR3JP96o">
           <note xml:id="i-K4NXZNhjqedhLT2mgspPYb">
             § 7 Anmälan om ersättare
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-37TeapwriAtaYsUehZSTaF">
           <note xml:id="i-pjaWvS52ctLq5NctfmAyj">
             § 8 Anmälan om ersättare för statsråd
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-SXudBx9J8gZeiRQxAwAboC">
           <note xml:id="i-TCZELaX43MvEdJr9MeM7ZQ">
             § 9 Anmälan om ersättare för talman
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-AGTuEeZLnF9p6AKLFNYmx7">
           <note xml:id="i-CGRLxVnpW4fHNcQRKciDct">
             § 10 Anmälan om kompletteringsval
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-XbHjeFUgBGmAeSP9R78yqU">
           <note xml:id="i-9RbJy9TKZEqKnirLvvSQat">
             § 11 Anmälan om ny ledamot i Europaparlamentet
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-H5CkXN1RYHysvhbC6XCt4s">
           <note xml:id="i-16z4Yxx2H5ufGG7x8pxVkH">
             § 12 Anmälan om fördröjda svar på interpellationer
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-SiPoBr8ZPyL4sSnNG4Yybz">
           <note xml:id="i-MqQLhzqUdZQShQcqj4L4U1">
             § 13 Anmälan om faktapromemorior
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-8tFw1bSgPtUBXxKRj6kfzL">
           <note xml:id="i-HqAj21UMDMwhq3nyv7wfBU">
             § 14 Anmälan om granskningsrapporter
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-NCDvCP3GQ5xEbC7goubgA9">
           <note xml:id="i-L9H5AtAJ1pNtv6HE8Xp4wV">
             § 15 Anmälan och omedelbar hänvisning av ärenden till utskott
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-XRMzZeUZqBvbSsCGZn8dbc">
           <note xml:id="i-Sjmy9DsGJ9KmfAZivikbrh">
             § 16 Svar på interpellation 2014/15:629 om Öresundssamarbete
           </note>
  • [ ] Correct
  • [x] Incorrect

corpus/protocols/201516/prot-201516--102.xml

Diff starting from line 8679

@@ -8595,6 +8679,8 @@
           <note xml:id="i-Lg4UUSYZEaM7zEpP2DMg34" type="speaker">
             Anf. 80 Utbildningsminister GUSTAV FRIDOLIN (MP)
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-5ZQs9Uyf91eac6x2jvDbrP">
           <note xml:id="i-FtDtadi4A1CgxwPy42nFz4">
             § 17 Svar på interpellation 2015/16:597 om digitala verktyg till
             nyanlända elever
  • [ ] Correct
  • [x] Incorrect

corpus/protocols/201516/prot-201516--118.xml

Diff starting from line 7422

@@ -7400,6 +7422,8 @@
           <note xml:id="i-8ZWcNpAkEtWdvnMdExgNR2">
             (Beslut skulle fattas den 15 juni.)
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-LuTkQsGxeFj6DaaX6y3SEB">
           <note xml:id="i-TRLzdicc649cdM9zHBbU6U">
             § 4 Övergångsstyre och utjämning vid ändrad kommun- och landstingsindelning
           </note>
  • [ ] Correct
  • [x] Incorrect

corpus/protocols/201617/prot-201617--132.xml

Diff starting from line 5507

@@ -5413,6 +5507,8 @@
           <note xml:id="i-DsfKwnirY3ayry8TLiQZLk" type="speaker">
             Anf. 70 Statsrådet ANNA EKSTRÖM (S)
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-WHuUJvLk4MBvaTWbKZDEXz">
           <note xml:id="i-NuKiNa3oRbRx1UwtC5JFyq">
             § 22 Svar på interpellation 2016/17:567 om psykisk ohälsa i gymnasiet
           </note>
  • [ ] Correct
  • [x] Incorrect

corpus/protocols/201617/prot-201617--26.xml

Diff starting from line 2246

@@ -2230,6 +2246,8 @@
           <note xml:id="i-KN2TqXyaGbYBWfTMo8dmtb">
             Överläggningen var härmed avslutad.
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-UTP5wAXm5cv23Ug2tC7mQW">
           <note xml:id="i-4WbDPFnVBDUWCixCXgUJ93">
             § 9 Svar på interpellation 2016/17:68 om svenska kommuners skatteintäkter
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/201617/prot-201617--29.xml

Diff starting from line 6744

@@ -6706,6 +6744,8 @@
             investeringsprodukter för icke-professionella investerare (Priip-produkter)
             vad gäller förordningens tillämpningsdag
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-VzJ1qH8cMTdndKv4wkVYBC">
           <note xml:id="i-SWLCKs9UiqCqTLYQTNZxLC">
             § 20 Anmälan om interpellationer
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/201617/prot-201617--71.xml

Diff starting from line 59

@@ -59,14 +59,18 @@
         </div>
       </front>
       <body>
-        <div>
+        <div type="commentSection" xml:id="i-YFtCuWMiDDWu6Uop9FZMJt">
           <pb facs="http://data.riksdagen.se/fil/FEB488CD-C695-4AC6-BF50-03E8DD992394#page=1"/>
+        </div>
+        <div type="commentSection" xml:id="i-BMV2VyzC6umMTWsPP1uyaw">
           <note xml:id="i-RHzFCz3FUyw7P9THXX4PPd">
             § 1 Justering av protokoll
           </note>
           <note xml:id="i-HkQCKHagHXAHV4tTTmBQLY">
             Protokollet för den 31 januari justerades.
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-MnPga3aEKRTDK14nvituAw">
           <note xml:id="i-XeK4DbvNUKiAio9JknQBBb">
             § 2 Anmälan om ny riksdagsledamot
           </note>
  • [ ] Correct
  • [x] Incorrect

corpus/protocols/201718/prot-201718--16.xml

Diff starting from line 2097

@@ -2087,6 +2097,8 @@
           <note xml:id="i-VprfTiJcobS5BY1aFLWkUu">
             till statsrådet Tomas Eneroth (S)
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-9AJtcwMQdjThYeF4js2jhp">
           <note xml:id="i-42sh1hHRcpx3F8E3zerAEG">
             § 6 Anmälan om frågor för skriftliga svar
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/201819/prot-201819--29.xml

Diff starting from line 896

@@ -882,6 +896,8 @@
             RiR 2018:32 Förvaltningen av premiepensionssystemet – kostnadseffektivitet
             för spararnas bästa?
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-URPXaDxaHtMWPbrpGJo7Nn">
           <note xml:id="i-AamRW2k5tVJHzPJHHx3khK">
             § 8 Ärende för hänvisning till utskott
           </note>
  • [x] Correct
  • [ ] Incorrect

corpus/protocols/201819/prot-201819--81.xml

Diff starting from line 12005

@@ -12005,7 +12005,7 @@
             Anf. 74 Statsminister STEFAN LÖFVEN (S)
           </note>
         </div>
-        <div type="debateSection">
+        <div type="debateSection" xml:id="i-9SAZJyKhY4oW94uYaAuTm6">
           <note xml:id="i-7PskJ8BQ6tG5KGB9E3NL4M">
             § 8 (forts. från § 6) Kriminalvårdsfrågor (forts. JuU13)
           </note>
  • [ ] Correct
  • [x] Incorrect

corpus/protocols/202021/prot-202021--12.xml

Diff starting from line 930

@@ -908,21 +930,33 @@
           <note xml:id="i-4zgVw8CEi2V7WRGwEPDeUT">
             Innehållsförteckning
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-2yzxffrPR29NXGwzQyf65k">
           <note xml:id="i-TFmWCfnWAdd55cQ87ESsaG">
             § 1 Avsägelser
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-3X51zpqMQnPRasbrdG4qEZ">
           <note xml:id="i-V2SEZX2HhFhFB6rg2iR2Jq">
             § 2 Anmälan om kompletteringsval
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-KzbhxJEVuoCbBNVf2jZWU9">
           <note xml:id="i-DUfcrF46i39FucKjtYMUNQ">
             § 3 Anmälan om subsidiaritetsprövning
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-Xvxh559FpJiAFQb7HF7ygT">
           <note xml:id="i-Q2an59QFjwVGMYVhqT2Qm9">
             § 4 Anmälan om fördröjt svar på interpellation
           </note>
+        </div>
+        <div type="commentSection" xml:id="i-7BDA4TFYeNEFgy8PCrZJDQ">
           <note xml:id="i-LZDTcAcrZpcLuqjtAxcXyf">
             § 5 Ärenden för hänvisning till utskott
           </note>
+        </div>
+        <div type="debateSection" xml:id="i-SVdbsBgArW7NeiGihZY8HX">
           <note xml:id="i-NRBVBusRwHf7TJjewfHLwx">
             § 6 Svar på interpellation 2019/20:451 om bistånd till stater
             som inte respekterar mänskliga rättigheter
  • [ ] Correct
  • [x] Incorrect
MansMeg commented 10 months ago

Any ideas how we formally know if it is correct or not?

MansMeg commented 10 months ago

Still the problem that tags becomes a section. This should be easy to fix?

Also, an innehållsförteckning seem to incorrectly end up in a large number of sections. Is this easy to fix?

BobBorges commented 10 months ago

Any ideas how we formally know if it is correct or not?

I guess if the div is not empty, doesn't contain multiple sections, and has the type+id attribs.

BobBorges commented 10 months ago

Still the problem that tags becomes a section. This should be easy to fix?

I don't follow.

Also, an innehållsförteckning seem to incorrectly end up in a large number of sections. Is this easy to fix?

After merging this it's what I wanted to do first after taking a first crack at identifying the interpellation debates. I don't think it would be too difficult, but you never know until you actually start doing it.

BobBorges commented 10 months ago

At this stage -- given it's the first kind of attempt at creating sections -- unless there is something really bad, i.e. that worsens the quality of the data/work we've already done (which I don't see in the sample or in other edits), then we should accept this round of div additions.

I see many things that could be better, but I don't think we will get it all right at once. Some incorrect section delimitation is an improvement over no section delimitation.

  • moving solo \ elems (that's what I didn't get before) into adjacent divs
  • joining stray solo sections under a unified table of contents
  • finding additional section delimitation
  • --- by missing nrs in the sequence, and / or
  • --- finding the end of a real section before the end of the div, e.g.: image

...can all be done in steps (minimal PR!), but if we sit on this for too long it blocks me from categorizing the debates

MansMeg commented 10 months ago

I fully agree. 1) I fully agree that we should do minimal PRs. That said, eg fixing the tag seem so small that it is just a quick fix (as a couple of lines of code). Then we might just fix it, right? The other issues seem to need som additional work. 2) The revision control: So we need to check that these divs are correct that includes the debateSection and commentSection. I guess we only check that the debatesection contain a real debate (or a section of a debate), right?

BobBorges commented 10 months ago

1.

No it's not that much work to fix stray \ elems in a section, but...

2.

2.1. We wanted to do this quality control before committing edits to the whole set of protocols for reasons of economy. So either we approve what's here and I can commit it, then fix the pb thing with another commit (before merging the PR), or I can fix it now in the already modified files, but then we conflate 'types' of edits in one piece of the revision history.

2.2. Debate sections have intros, comment sections don't -- it seems like a reasonable criterion for evaluation. Should I check that? in the sample? I'd like to be able to take this a step or two forward today.

MansMeg commented 10 months ago
  1. I was thinking of fixing all ? Not just the stray ones. An estimate is that roughly 2% of all edits are due to this problem?

2.a. Im not sure I followed. So I just checked for obvious errors and found those. If we fix those, we can get a new sample we can assess. That should not conflate anything or be problematic?

2.b. Great. I just wanted to know. Then it seems good to just check the debates based on this definition and check that the commentSections are not incorrect and that not incorrect divs are introduced.

But this raises an issue that we need to start to define divs in a better way. Because this is slightly in between an analytic decision and an data authentic one. And we want to be as close to the latter as possible.

BobBorges commented 10 months ago

I've gone through them now: mostly they're ok. Marked correct if:

  • div elem has id
  • schematically correct
  • debates have intros/comment sections have no intros

It looks like 6 are incorrect by those criteria and the incorrect ones are due to lone \ elems in a div or the content of the table of contents section getting tagged as section head and intros. I'll commit the rest of the protocols, then let's merge and I'll open issues for these two problems.

MansMeg commented 10 months ago

Great! Do you open an issue?

BobBorges commented 10 months ago

405

BobBorges commented 10 months ago

@MansMeg will you merge when the tests pass?