openva / va-decoded

The Virginia implementation of The State Decoded.
https://vacode.org/
Other
3 stars 4 forks source link

Some XML section numbers are invalid #29

Closed waldoj closed 8 years ago

waldoj commented 8 years ago

We're seeing section numbers like § 59.1-10through59 and 38.2-204,38.2-20. This is surely from Lexis' XML. These likely are placeholders, indicating either where space is being held for future sections or where sections have been removed. They're not a problem, but they're not great.

waldoj commented 8 years ago

I speculate that the solution is to use the anchor ID instead of the designator. That is:

<anchor id="_64.1-01" />
<heading>
   <title>Repealed</title>
   <desig>§§ 64.1-01 through 64.1-206.8.</desig>
</heading>

I used desig as the source of the section identifier, but we can see here why that's a bad idea. Perhaps the id attribute of anchor is a better source? I'm not confident that it's going to use the proper characters, but a quick review of some tricky section numbers (e.g., 46.2-749.28:2) might provide an answer.

waldoj commented 8 years ago

Well, that was quick. As I worried, : is a reserved character, so § 46.2-749.28:2 is represented as _46.2-749.28_2. But if I can know for sure that _ will always represent : (save as the leading character), then that's solvable.

waldoj commented 8 years ago

I think this is going to work. I'll test it out.

waldoj commented 8 years ago

Worked great.

krusynth commented 7 years ago

Where did you end up fixing this? The import is dying for me on 8.01-341.1 which has

<section prefix="1. through 3"> [Repealed.]</section>
waldoj commented 7 years ago

That's odd—I don't remember encountering that problem. Maybe a change made in The State Decoded in the interim prevents this import from working now? (Justifiably—1. through 3 is not an actual prefix.)

krusynth commented 7 years ago

Well, the current situation is that vacode is incorrectly reporting that only the first section is repealed, when 1-3 actually are repealed: https://vacode.org/8.01-341.1/ So just lopping off the "through 3" part isn't good enough.

I assume the best thing to do here is leave this as-is in the XSLT, display the inline text as 1 through 3 and change the internal identifier (which isn't displayed) to 1-3. I'll need to increase the size of the identifier field here to handle this.

waldoj commented 7 years ago

Ugh. I'm glad you caught that. :-/