usgpo / bulk-data

User Guides for XML on the govinfo Bulk Data Repository. For information about Bill Status XML Bulk Data, see https://github.com/usgpo/bill-status.
https://www.govinfo.gov/bulkdata
270 stars 99 forks source link

Request: Congressional Record meta data: identify headings and subheadings #155

Open TimTCM opened 6 months ago

TimTCM commented 6 months ago

To the Congressional Record HTML files, would it be possible to add text markup tags or meta data to identify headings or subheadings?

This is currently hard to do just based on the variable spacing of text being centered within 70 possible characters of a monospaced text line, especially when a subheading extends beyond a single line.

Here are a couple examples of pages with a subheading: https://www.govinfo.gov/content/pkg/CREC-2024-03-12/html/CREC-2024-03-12-pt1-PgH1097-4.htm https://www.govinfo.gov/content/pkg/CREC-2024-03-12/html/CREC-2024-03-12-pt1-PgH1098-2.htm

jonquandt commented 6 months ago

Thanks for providing this feedback. Since GovInfo does not control the text or format of the content itself, which we rely on to generate the metadata, I will pass this along to folks who may be able to look at this upstream for the future.

Changes to the HTML display of the Congressional Record are on the roadmap for development of GPO's XPub system, but we don't have a specific timeframe at the present time. For more information on XPub, as well as sample responsive HTML files for Congressional Bills and Public Laws, please see https://github.com/usgpo/xpub.

Similar story for #153 and #154

TimTCM commented 6 months ago

Thank you for the reply, Jon.

Are the upstream folks the Reporters of Congressional Debate?