snikproject / ontology

Public SNIK Ontology. An ontology of information management in hospitals.
https://snikproject.github.io/ontology/
Other
10 stars 1 forks source link

Extract missing pages and chapters from blue book #310

Closed KonradHoeffner closed 5 years ago

KonradHoeffner commented 5 years ago

See also: https://github.com/IMISE/snik-ontology/issues/296, https://github.com/IMISE/snik-ontology/issues/309.

Despite efforts to restore missing chapters, there are still several hundred classes in the blue book without page, partly without chapter too. All classes with a page now have a chapter because we automatically generated those, see IMISE/snik-cytoscape.js#214.

However for those that don't have a page and maybe also don't have a chapter, we should extract those from the blue book. Using the digital version and using search it should be doable semi-automatically and it would be a huge help for the chapter search feature for teaching.

ThomasPause commented 5 years ago

missing pages extracted and output as rdf triples via python script in a private repo due to license issues: https://github.com/ThomasPause/MissingPageFinder

ThomasPause commented 5 years ago

solve the following problems:

ThomasPause commented 5 years ago

I think the main difference is that the algorithm finds any occurrence of a given word or word group, specified by the regex, whilst the human extraction maybe a little bit inconsistent. The algo can not rate if a word fits in the context of the text or not but it is definitely much faster. Words like "hospital", "operation" or "communication" are found very often throughout the book (maybe exclude them in the script)

ThomasPause commented 5 years ago

this shows that some buzzwords occur very often and the question is if it is useful or not to have an array of about 250 pagenumbers where the term "hospital" is in the book. So I think there is a limit for automated extraction. Maybe we can skip the most often occurring words and let the manual analysis test which pagenumber(s) are useful to keep and which are not.

KonradHoeffner commented 5 years ago

classeswithpage.tsv.txt bbclasses.tsv.txt

ThomasPause commented 5 years ago

Verteilung_manuell

Die Verteilung nach manueller Extraktion

# Label
337 Information
239 System
218 Patient
215 Data
196 Management
169 Information System
115 Information Management
112 Quality
105 Order
95 Information Processing
93 Application Component
85 Planning
74 Integration
70 Documentation
68 Logical Tool Layer
63 Model
63 Case
60 Hospital Information System
58 Strategic Information Management
53 Service
50 Monitoring
47 Knowledge
43 Enterprise Function
41 Patient Administration
36 Transfer
34 Server
34 Hospital Function
33 Patient Record
33 Information Management in Hospitals
33 Evaluation
32 Physical Tool Layer
32 Patient Admission
28 Operational Information Management
28 Metamodel
28 Health Information System
28 Diagnosis
27 Patient Administration System
27 Order Entry
26 Directing
25 Strategic Information Management Plan
25 Integrity
24 Reference Model
23 Hospital Management
22 Project
21 Functionality
20 Patient Identification
20 Controlling
19 Room
19 Procedure
19 Message
19 Health Care Network
18 Archiving
17 Usability
17 Entity Type
17 Efficiency
16 Transcription
16 Appointment
15 Tactical Information Management
15 Quality Management
15 Electronic Patient Record
15 Drug
15 Data Integration
15 Consistency
15 Bed
14 Reliability
14 Quality of Processes
14 Health Care Professional
14 Functional Redundancy
14 Clinical Information System
13 Medical Documentation System
13 Medical Admission
13 Business Process
12 insurance
12 IHE
12 Confidentiality
12 Client
12 Administrative Admission
12 Activity
11 Quality of Structures
11 Patient History
11 Electronic Health Record
11 acceptance
10 Transinstitutional Health Information System
10 Strategic Alignment
10 Outpatient Management System
10 Nursing Management and Documentation System
10 Finding
9 Stability
9 Software Product
9 Semantic Integration
9 Quality of Outcome
9 Physical Data Processing System
9 Material
9 Interoperability
9 Hospital Administration
9 Financial Accounting
9 Facility Management
9 extent
9 Contextual Integration
9 Communication Server
9 Clinical Research
9 Clinical Documentation
9 Certification
9 Architectural Style
8 Subsystem
8 Referential Integrity
8 Radiology Information System
8 Quality of Data
8 Process Integration
8 Middleware
8 Completeness
8 Communication Standard
8 Access Integration
7 Strategic HIS Planning
7 Specialization
7 Presentation Integration
7 Patient Identification Number
7 Patient Data Management System
7 Object Identity
7 Master Patient Index
7 Laboratory Information System
7 Classification
7 CDA
7 Benchmarking
6 Synchronous Communication
6 Strategic HIS Monitoring
6 SOA
6 Sample
6 Object Class
6 Multiple Usability of Data
6 IT Service Management
6 HIS Quality
6 HIS Benchmarking
6 Functional Integration
6 Feature
6 Data Model
6 COBIT
6 CCHIT
5 Virtualization
5 Star
5 Physical Integration
5 Patient Record System
5 Organizational Unit
5 Operation Management System
5 Message Type
5 Medical Procedure
5 Media Crack
5 location
5 Leanness of Information Processing Tools
5 Laundry
5 Informed Consent
5 Functional Leanness
5 First Study Design
5 Enterprise Resource Planning System
5 duration
5 Document Archiving System
5 Chief Information Officer
5 CCOW
5 Business Strategy
5 Asynchronous Communication
5 Accuracy
4 UML Class
4 Transaction Management
4 Terminal
4 Sustainability
4 Study Exploration
4 Portability
4 Picture Archiving and Communication System
4 Patient Transport
4 Patient Record Archive
4 Nursing Procedure
4 Nursing Admission
4 Notification
4 Master Application Component
4 Information Management Board
4 Human Resource
4 HIS Infrastructure
4 Food
4 Efficiency of Information Logistics
4 EDIFACT
4 Decomposition
4 DBn Style
4 Data Warehouse System
4 Computer System
4 Communication Interface
4 Archiving of Patient Information
4 ADT
4 3LGM
3 UML Class Diagram
3 Transparency
3 Teleradiology
3 Strategic HIS Directing
3 Service Catalog
3 Saturation
3 Relevancy
3 Project Plan
3 Project Management Board
3 Portfolio Management
3 Pharmacy Information System
3 Personal Health Record
3 Patient Chart System
3 Pathology Information System
3 Organizational Model
3 Medical Device
3 Medical and Nursing Knowledge
3 Mainframe Architecture
3 Linear
3 Key Performance Indicator
3 ITIL
3 is needed for
3 Discharge Summary
3 Dialysis Information System
3 Decision Support System
3 Database Server
3 Cost Unit
3 Cost Accounting
3 Controllability
3 Clinical Trial
3 Clinical Pathway
3 Chief Executive Officer
3 Business Process Model
3 Bill
3 Balance of Functional Leanness and Functional Redundancy
3 Balance of Documentation Quality and Documentation Efforts
3 Architecture of an Information System
3 Adaptability
2 Utility Analysis
2 User Satisfaction
2 Tree
2 Technical Model
2 SWOT Analysis
2 Suitability for Learning
2 Suitability for Individualization
2 Software Quality
2 Software Ergonomics
2 Ring
2 R01
2 Quality Report
2 Quality of HIS Architecture
2 Procedure Class
2 ORU
2 Organizational System
2 Nursing Anamnesis
2 NANDA
2 Medical Anamnesis
2 Means of Transport
2 Maturity
2 Long Term Archiving
2 ISO 9001
2 Integrity of Data
2 Integrated HIS
2 Information System Model
2 HIS Certification
2 Health Care Regulation
2 Functional Redundancy Rate
2 Functional Model
2 Federated Database System
2 Facility and Area
2 External Finding
2 Error Tolerance
2 Diagnosis Class
2 DB1 Style
2 Converging Technologies
2 Controlling Report
2 Conformity with User Expectations
2 ComputerBased Application Component
2 Communication Link
2 Classification of Diagnoses
2 Chief Financial Officer
2 Cardiovascular Information System
2 Bus
2 Bed Occupation
2 BAR
2 Balance of Data Security and Working Processes
2 Availability of Data
2 Application Server
2 Ad Hoc Monitoring
1 Virtual Private Network
1 User Survey
1 Usability Study
1 Terminal Server
1 Teleradiology System
1 Suitability for the Task
1 Storage Area Network
1 Server Cluster
1 Remote Function Call
1 Quality of Tactical Information Management
1 Quality of IT Training
1 Quality of IT Support
1 Qualitative Observation
1 Project Planning
1 Orthopedics Information System
1 ORM
1 OpenEHR
1 Oncology Information System
1 NOC
1 NIC
1 Mixed Approach
1 Mesh
1 ISO TR 22221
1 Isolation
1 ISO 9241
1 Internal Quality Management
1 Integration Technology
1 Integrated Health Care Delivery System
1 Infrastructure of an Information System
1 Information System Metamodel
1 Information Processing Tool
1 Information and Knowledge Logistics
1 Inductive Approach
1 Imaging Modality
1 IHE Patient Demographics Query
1 Hospital Budget
1 Homogeneous Architecture
1 Homogeneity of the HIS Architecture
1 HL7 Version 3
1 Healthcare Services Specification Project
1 Effectiveness Study
1 Durability
1 DICOM Standard
1 Delphi Survey
1 Deductive Approach
1 deadline
1 Data Transmission Connection
1 Data Metamodel
1 Correct Information
1 Controlled Transcription of Data
1 Controlled Redundancy of Data
1 ComputerBased Information System
1 Component Alignment Model
1 Classification of Procedures
1 Business Process Metamodel
1 Blood Bank Management System
1 Atomicity
1 ACID
ThomasPause commented 5 years ago

Verteilung_automatisiert

Die Verteilung nach automatisierter Extraktion

# Label
249 Hospital
99 Communication
68 Organization
58 Patient Care
56 Operation
54 Network
49 Physician
40 Ward
36 Standard
33 Outpatient
32 Laboratory
27 Resource
26 Security
22 Book
21 Medication
20 Nurse
16 Study
16 Education
14 Health Care Institution
14 Event
13 Nursing Management
13 Information and Communication Technology
12 Vendor
12 Inpatient
11 Supply and Disposal Management
11 Nursing Care Planning
11 HIS Architecture
11 Framework
10 Report Writing
10 Quality of Patient Care
10 Outpatient Unit
10 Management Department
10 Database System
10 Computing Center
10 Communication Network
10 Coding of Diagnoses and Procedures
10 Coding of Diagnoses
9 Scheduling and Resource Allocation
9 Appointment Scheduling
9 Administrative Staff
8 University Medical Center
8 Radiology Department
8 Medical and Nursing Care Planning
8 Library
8 Data Security
7 Systematic Information Processing
7 Research and Education
7 Joint Commission
7 Information Management in Health Care Networks
7 Data Protection
6 XML
6 Workstation
6 Study Design
6 Project Portfolio
6 Permanent Monitoring
6 Law
6 Execution of Diagnostic and Therapeutic Procedures
6 DRG
6 Administrative Discharge and Billing
5 Visitor and Information Service
5 Top Management
5 Pharmacy
5 Personal Computer
5 Pathology
5 Nursing Report
5 Medical Knowledge
5 ICD
5 Economic Analysis
5 Computer Network
5 Administrative Data
4 Switch
4 Senior Physician
4 Report and Publication of Study
4 Quality Assessment
4 Patient Transfer
4 Medical Report
4 General Practitioner
4 Execution of Nursing Procedures
4 EuroRec
4 Diet
4 Computed Tomography
4 Cluster
3 Surgeon
3 Study Plan
3 Reference Model for the Domain Layer of Hospital Information Systems
3 Radiological Examination
3 Preparation of an Order
3 Patient Discharge and Transfer to Other Institutions
3 Pager
3 Operation Room
3 Operationalization of Methods and Detailed Study Plan
3 Network Management
3 Medical Discharge and Medical Report Writing
3 LOINC
3 Information Management Staff
3 Disease Management
3 Communication Pattern
3 Class Diagram
3 Access Control
2 User Training
2 UML Activity Diagram
2 Trouble Shooting
2 TISS
2 The Strategy of Independent Health Banks
2 Therapeutic Intervention Scoring System
2 Technical Staff
2 System of Concepts
2 Staff Controlling
2 Software Development
2 Risk Management
2 Radiotherapy
2 Radiology Report
2 Project Manager
2 Pencil
2 Patient Identification and Checking for Recurrent
2 Orientation
2 Organization of Health Care
2 Ordering of Drugs
2 Nutrition
2 Nursing Discharge and Nursing Report Writing
2 Mobile Phone
2 Mobile Computer
2 Migration Path
2 Medical Service
2 Medical Care Planning
2 Material Controlling
2 Management of User Accounts
2 Management of Medical Devices
2 Malcolm Baldrige National Quality Award
2 Laptop
2 IT Support
2 ISO Standard
2 ICD10
2 Hospital Staff
2 HL7 Reference Information Model
2 HIS Operation
2 Health Insurance Company
2 General Practice
2 Fulfillment of the Expectations of Different Stakeholders
2 Financial Controlling
2 Federated Database Schema
2 Execution of Lab Examinations
2 Execution of Clinical Trials and Experiments
2 Execution of Clinical Trials
2 Ethernet
2 Digital Dictation System
2 Decision Making and Patient Information
2 Customizing
2 CT
2 Blood Bank
2 Behavioral Perspective
2 Backbone
2 Asset Accounting
2 Application Portfolio
2 Administrative Patient Data
1 Written Agreement on the Study Outline
1 Wound Treatment
1 Wound Care
1 Work Sampling
1 Work Organization and Time Management
1 Working List
1 Virtual Server
1 User Requirements Analysis
1 UMLS
1 Ultrasound
1 Transmitting Medium
1 Token Ring
1 Time Measurement
1 TIFF
1 Therapeutic Service
1 Telemicroscopy
1 Tagged Image File Format
1 Systems Review
1 System Analysis
1 Study Report
1 Study Outline
1 Structural Quality Assessment
1 Statistical Evaluation of Patient Data
1 Star Architecture
1 Stability of Application Components
1 Software Vendor
1 Social History
1 SNOMED
1 Skin Care
1 Simplified Acute Physiology Score
1 ShortTerm HIS Planning
1 Sheet of Paper
1 Service Transition
1 Service Strategy
1 Service Operation
1 Service Design
1 Security of Data
1 Scheduling and Resource Planning with the Medical Service Unit
1 Robot System
1 Research Management
1 Rehabilitation Center
1 Recruitment of Patients
1 Quality of Information Management
1 Qualitative Content Analysis
1 Publishing and Presentation
1 Problem Solving
1 Problem Management
1 Printer Server
1 Plan and Organize
1 Personal Hygiene
1 Personal Digital Assistant
1 Performance of Legal Notification Requirements
1 Payroll Accounting
1 Patient State
1 Patient Billing
1 Patient Administration Department
1 Organizational Perspective
1 Organizational Infrastructure
1 Oral and Dental Care
1 Optical Fiber
1 Operation Report
1 Operational Management Concept
1 Nursing History
1 Nursing Classification
1 Nursing Care Plan
1 Nuclear Medicine Imaging
1 Nomenclature
1 Network Topology
1 Network Monitoring
1 MPEG
1 Monitor and Evaluate
1 Modeling Information Systems
1 Memory Stick
1 Material and Medication Management
1 Magnetic Resonance Imaging
1 Logical Observation Identifier Names and Codes
1 Laundry Management
1 Knowledge Retrieval and Literature Management
1 KHStatV
1 KHG
1 JPEG
1 Joint Photographic Experts Group
1 IuKDG
1 IT Evaluation Study
1 ISO 14721
1 International DICOM Committee
1 Internal Reporting System
1 Integrated Care
1 Information Desk
1 Incident Taking
1 Incident Management
1 Incident Analysis
1 IHE Technical Framework
1 ICPM
1 Human Resources Department
1 Home of the Patient
1 HL7 Version 2
1 HL7 RIM
1 HIS Component
1 HIPAA
1 Hair und Nail Care
1 German Law
1 General Administration Department
1 Functional Perspective
1 Fork Lift
1 Financing of Health Care
1 Field Study
1 Family History
1 Failure Management
1 eXtensible Markup Language
1 Execution of Prophylaxis
1 Execution of Irradiation
1 Excretion
1 Evaluation Method
1 Establish and Promote the Strategic Information Management Plan
1 Encryption Software
1 Drug Service
1 Diagnostic Service
1 Department of Quality Management
1 Department of Facility Management
1 Deliver and Support
1 Debtor Accounting
1 Data Reference Model
1 Copper Cable
1 Continual Service Improvement
1 Consultant
1 Consensus Method
1 ComputerBased Information Processing
1 Complaint
1 Communication Protocol
1 Communication Ability
1 Coding of Procedures
1 Clinical Department
1 Clinical Data Warehouse
1 Chemotherapeutic Treatment
1 Change Management
1 CEN
1 Catering
1 Business Intelligence System
1 Body Washing
1 ATC
1 ASCII
1 American Standard Code for Information Interchange
1 Admission Diagnosis
1 Administration of Human Resource Master Data
1 Administration of Business Trips and Further Training
1 Administration Management
1 Administration and Allocation of Patient Records
1 Acquire and Implement
1 Access Control System
ThomasPause commented 5 years ago

Die manuelle Extraktion liefert maximal 2 Seiten pro Klasse, bei der automatisierten gibt es Klassen mit mehreren hundert Vorkommen. Daher ist die Frage ob es sinnvoll ist, einen Cutoff zu setzen und wenn ja wo.

ThomasPause commented 5 years ago

cutoff is now at 3, that means that classes which appear more than 3 times are not written into the file. Last problem now is, that the offset between pdf manuscript and the 2nd edition book varies and so the pages are not correct yet. We need a pdf from the 2nd edition book to extract correctly

ThomasPause commented 5 years ago

algorithm now converts to normalized unicode, sorted results are attached here as .nt file @KonradHoeffner please put them into the SPARQL endpoint outputsorted.nt.txt

KonradHoeffner commented 5 years ago

Upload done