Open tuanpham96 opened 7 months ago
Thanks for raising this. It's an issue with Plural since Open States doesn't have bill version text available like that, but I'm pretty sure it's an issue because the TX pdfs have white 'A' characters to substitute for spacing so we had to try replacing the excess As & it may have removed some that weren't meant to be removed. I'll take a look & see if that is still how TX spaces their bills & we still need that removal logic.
Thanks for the explanation!
May I ask how differently Plural and OpenStates source the text?
When doing text analysis, which one should I rely on more, not only for TX but other states as well?
Open States provides the links to Plural, Plural does a separate text extraction & processing that gets bill text for each version. Open States only processes text for search purposes & doesn't save each bill's version text.
I'm reporting issues I'm seeing on PluralPolicy. If this is not the right place, please let me know where to direct this.
Example URL:
Browser: Both Firefox and Chrome have this issue
Issue: The characters
A
are missing in the bill text section; not all the time, but usually when it appears in in between of( )
or in actual double quotes" "
. Turning on/off markup doesn't matter. I haven't checked which other bills or states may have this problem. But I spotted two examples for this.Below are the screenshots, comparing between what's on the PluralPolicy side and what's in the source PDFs
Note: I also double-checked the bulk json on OpenStates and this problem does not seem to appear in the bulk json.