welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Marginal notes are tagged with date #335

Closed MansMeg closed 6 months ago

MansMeg commented 1 year ago

We now have a tag that the marginal notes are dates. See the example below. Should we keep this as a date or call it "marginal note"?

           </note>
           <pb n="35" facs="https://betalab.kb.se/prot-1900--fk--11/prot_1900__fk__11-035.jp2/_view"/>
          <note xml:id="i-9LEfCGZtUBd4EDB7bd1CCH" type="date">
             N:o II. 36 Tisdagen den 20 Februari..
           </note>
ninpnin commented 10 months ago

All of these do contain the date of the protocol, and type=date is the parlaClarin way of denoting that.

MansMeg commented 10 months ago

Seem like a strange way to annotate dates. This marginal note contain both date and the page number for example.

ninpnin commented 10 months ago

I think the other options are a lot worse:

  1. Not annotate dates. Then we have no idea where the date metadata comes from
  2. Use a lot of resources to split these into multiple paragraphs with high accuracy
MansMeg commented 10 months ago

I think neither option is the final option we want to go with. The current semantics is kind of bad. The type is not a date, but the marginal note contain the date. I guess we can find a better way. Although this is far down the priority ladder.

ninpnin commented 6 months ago

To me, this is a segmentation error #250