Closed KonradHoeffner closed 5 years ago
missing pages extracted and output as rdf triples via python script in a private repo due to license issues: https://github.com/ThomasPause/MissingPageFinder
solve the following problems:
I think the main difference is that the algorithm finds any occurrence of a given word or word group, specified by the regex, whilst the human extraction maybe a little bit inconsistent. The algo can not rate if a word fits in the context of the text or not but it is definitely much faster. Words like "hospital", "operation" or "communication" are found very often throughout the book (maybe exclude them in the script)
this shows that some buzzwords occur very often and the question is if it is useful or not to have an array of about 250 pagenumbers where the term "hospital" is in the book. So I think there is a limit for automated extraction. Maybe we can skip the most often occurring words and let the manual analysis test which pagenumber(s) are useful to keep and which are not.
# | Label |
---|---|
337 | Information |
239 | System |
218 | Patient |
215 | Data |
196 | Management |
169 | Information System |
115 | Information Management |
112 | Quality |
105 | Order |
95 | Information Processing |
93 | Application Component |
85 | Planning |
74 | Integration |
70 | Documentation |
68 | Logical Tool Layer |
63 | Model |
63 | Case |
60 | Hospital Information System |
58 | Strategic Information Management |
53 | Service |
50 | Monitoring |
47 | Knowledge |
43 | Enterprise Function |
41 | Patient Administration |
36 | Transfer |
34 | Server |
34 | Hospital Function |
33 | Patient Record |
33 | Information Management in Hospitals |
33 | Evaluation |
32 | Physical Tool Layer |
32 | Patient Admission |
28 | Operational Information Management |
28 | Metamodel |
28 | Health Information System |
28 | Diagnosis |
27 | Patient Administration System |
27 | Order Entry |
26 | Directing |
25 | Strategic Information Management Plan |
25 | Integrity |
24 | Reference Model |
23 | Hospital Management |
22 | Project |
21 | Functionality |
20 | Patient Identification |
20 | Controlling |
19 | Room |
19 | Procedure |
19 | Message |
19 | Health Care Network |
18 | Archiving |
17 | Usability |
17 | Entity Type |
17 | Efficiency |
16 | Transcription |
16 | Appointment |
15 | Tactical Information Management |
15 | Quality Management |
15 | Electronic Patient Record |
15 | Drug |
15 | Data Integration |
15 | Consistency |
15 | Bed |
14 | Reliability |
14 | Quality of Processes |
14 | Health Care Professional |
14 | Functional Redundancy |
14 | Clinical Information System |
13 | Medical Documentation System |
13 | Medical Admission |
13 | Business Process |
12 | insurance |
12 | IHE |
12 | Confidentiality |
12 | Client |
12 | Administrative Admission |
12 | Activity |
11 | Quality of Structures |
11 | Patient History |
11 | Electronic Health Record |
11 | acceptance |
10 | Transinstitutional Health Information System |
10 | Strategic Alignment |
10 | Outpatient Management System |
10 | Nursing Management and Documentation System |
10 | Finding |
9 | Stability |
9 | Software Product |
9 | Semantic Integration |
9 | Quality of Outcome |
9 | Physical Data Processing System |
9 | Material |
9 | Interoperability |
9 | Hospital Administration |
9 | Financial Accounting |
9 | Facility Management |
9 | extent |
9 | Contextual Integration |
9 | Communication Server |
9 | Clinical Research |
9 | Clinical Documentation |
9 | Certification |
9 | Architectural Style |
8 | Subsystem |
8 | Referential Integrity |
8 | Radiology Information System |
8 | Quality of Data |
8 | Process Integration |
8 | Middleware |
8 | Completeness |
8 | Communication Standard |
8 | Access Integration |
7 | Strategic HIS Planning |
7 | Specialization |
7 | Presentation Integration |
7 | Patient Identification Number |
7 | Patient Data Management System |
7 | Object Identity |
7 | Master Patient Index |
7 | Laboratory Information System |
7 | Classification |
7 | CDA |
7 | Benchmarking |
6 | Synchronous Communication |
6 | Strategic HIS Monitoring |
6 | SOA |
6 | Sample |
6 | Object Class |
6 | Multiple Usability of Data |
6 | IT Service Management |
6 | HIS Quality |
6 | HIS Benchmarking |
6 | Functional Integration |
6 | Feature |
6 | Data Model |
6 | COBIT |
6 | CCHIT |
5 | Virtualization |
5 | Star |
5 | Physical Integration |
5 | Patient Record System |
5 | Organizational Unit |
5 | Operation Management System |
5 | Message Type |
5 | Medical Procedure |
5 | Media Crack |
5 | location |
5 | Leanness of Information Processing Tools |
5 | Laundry |
5 | Informed Consent |
5 | Functional Leanness |
5 | First Study Design |
5 | Enterprise Resource Planning System |
5 | duration |
5 | Document Archiving System |
5 | Chief Information Officer |
5 | CCOW |
5 | Business Strategy |
5 | Asynchronous Communication |
5 | Accuracy |
4 | UML Class |
4 | Transaction Management |
4 | Terminal |
4 | Sustainability |
4 | Study Exploration |
4 | Portability |
4 | Picture Archiving and Communication System |
4 | Patient Transport |
4 | Patient Record Archive |
4 | Nursing Procedure |
4 | Nursing Admission |
4 | Notification |
4 | Master Application Component |
4 | Information Management Board |
4 | Human Resource |
4 | HIS Infrastructure |
4 | Food |
4 | Efficiency of Information Logistics |
4 | EDIFACT |
4 | Decomposition |
4 | DBn Style |
4 | Data Warehouse System |
4 | Computer System |
4 | Communication Interface |
4 | Archiving of Patient Information |
4 | ADT |
4 | 3LGM |
3 | UML Class Diagram |
3 | Transparency |
3 | Teleradiology |
3 | Strategic HIS Directing |
3 | Service Catalog |
3 | Saturation |
3 | Relevancy |
3 | Project Plan |
3 | Project Management Board |
3 | Portfolio Management |
3 | Pharmacy Information System |
3 | Personal Health Record |
3 | Patient Chart System |
3 | Pathology Information System |
3 | Organizational Model |
3 | Medical Device |
3 | Medical and Nursing Knowledge |
3 | Mainframe Architecture |
3 | Linear |
3 | Key Performance Indicator |
3 | ITIL |
3 | is needed for |
3 | Discharge Summary |
3 | Dialysis Information System |
3 | Decision Support System |
3 | Database Server |
3 | Cost Unit |
3 | Cost Accounting |
3 | Controllability |
3 | Clinical Trial |
3 | Clinical Pathway |
3 | Chief Executive Officer |
3 | Business Process Model |
3 | Bill |
3 | Balance of Functional Leanness and Functional Redundancy |
3 | Balance of Documentation Quality and Documentation Efforts |
3 | Architecture of an Information System |
3 | Adaptability |
2 | Utility Analysis |
2 | User Satisfaction |
2 | Tree |
2 | Technical Model |
2 | SWOT Analysis |
2 | Suitability for Learning |
2 | Suitability for Individualization |
2 | Software Quality |
2 | Software Ergonomics |
2 | Ring |
2 | R01 |
2 | Quality Report |
2 | Quality of HIS Architecture |
2 | Procedure Class |
2 | ORU |
2 | Organizational System |
2 | Nursing Anamnesis |
2 | NANDA |
2 | Medical Anamnesis |
2 | Means of Transport |
2 | Maturity |
2 | Long Term Archiving |
2 | ISO 9001 |
2 | Integrity of Data |
2 | Integrated HIS |
2 | Information System Model |
2 | HIS Certification |
2 | Health Care Regulation |
2 | Functional Redundancy Rate |
2 | Functional Model |
2 | Federated Database System |
2 | Facility and Area |
2 | External Finding |
2 | Error Tolerance |
2 | Diagnosis Class |
2 | DB1 Style |
2 | Converging Technologies |
2 | Controlling Report |
2 | Conformity with User Expectations |
2 | ComputerBased Application Component |
2 | Communication Link |
2 | Classification of Diagnoses |
2 | Chief Financial Officer |
2 | Cardiovascular Information System |
2 | Bus |
2 | Bed Occupation |
2 | BAR |
2 | Balance of Data Security and Working Processes |
2 | Availability of Data |
2 | Application Server |
2 | Ad Hoc Monitoring |
1 | Virtual Private Network |
1 | User Survey |
1 | Usability Study |
1 | Terminal Server |
1 | Teleradiology System |
1 | Suitability for the Task |
1 | Storage Area Network |
1 | Server Cluster |
1 | Remote Function Call |
1 | Quality of Tactical Information Management |
1 | Quality of IT Training |
1 | Quality of IT Support |
1 | Qualitative Observation |
1 | Project Planning |
1 | Orthopedics Information System |
1 | ORM |
1 | OpenEHR |
1 | Oncology Information System |
1 | NOC |
1 | NIC |
1 | Mixed Approach |
1 | Mesh |
1 | ISO TR 22221 |
1 | Isolation |
1 | ISO 9241 |
1 | Internal Quality Management |
1 | Integration Technology |
1 | Integrated Health Care Delivery System |
1 | Infrastructure of an Information System |
1 | Information System Metamodel |
1 | Information Processing Tool |
1 | Information and Knowledge Logistics |
1 | Inductive Approach |
1 | Imaging Modality |
1 | IHE Patient Demographics Query |
1 | Hospital Budget |
1 | Homogeneous Architecture |
1 | Homogeneity of the HIS Architecture |
1 | HL7 Version 3 |
1 | Healthcare Services Specification Project |
1 | Effectiveness Study |
1 | Durability |
1 | DICOM Standard |
1 | Delphi Survey |
1 | Deductive Approach |
1 | deadline |
1 | Data Transmission Connection |
1 | Data Metamodel |
1 | Correct Information |
1 | Controlled Transcription of Data |
1 | Controlled Redundancy of Data |
1 | ComputerBased Information System |
1 | Component Alignment Model |
1 | Classification of Procedures |
1 | Business Process Metamodel |
1 | Blood Bank Management System |
1 | Atomicity |
1 | ACID |
# | Label |
---|---|
249 | Hospital |
99 | Communication |
68 | Organization |
58 | Patient Care |
56 | Operation |
54 | Network |
49 | Physician |
40 | Ward |
36 | Standard |
33 | Outpatient |
32 | Laboratory |
27 | Resource |
26 | Security |
22 | Book |
21 | Medication |
20 | Nurse |
16 | Study |
16 | Education |
14 | Health Care Institution |
14 | Event |
13 | Nursing Management |
13 | Information and Communication Technology |
12 | Vendor |
12 | Inpatient |
11 | Supply and Disposal Management |
11 | Nursing Care Planning |
11 | HIS Architecture |
11 | Framework |
10 | Report Writing |
10 | Quality of Patient Care |
10 | Outpatient Unit |
10 | Management Department |
10 | Database System |
10 | Computing Center |
10 | Communication Network |
10 | Coding of Diagnoses and Procedures |
10 | Coding of Diagnoses |
9 | Scheduling and Resource Allocation |
9 | Appointment Scheduling |
9 | Administrative Staff |
8 | University Medical Center |
8 | Radiology Department |
8 | Medical and Nursing Care Planning |
8 | Library |
8 | Data Security |
7 | Systematic Information Processing |
7 | Research and Education |
7 | Joint Commission |
7 | Information Management in Health Care Networks |
7 | Data Protection |
6 | XML |
6 | Workstation |
6 | Study Design |
6 | Project Portfolio |
6 | Permanent Monitoring |
6 | Law |
6 | Execution of Diagnostic and Therapeutic Procedures |
6 | DRG |
6 | Administrative Discharge and Billing |
5 | Visitor and Information Service |
5 | Top Management |
5 | Pharmacy |
5 | Personal Computer |
5 | Pathology |
5 | Nursing Report |
5 | Medical Knowledge |
5 | ICD |
5 | Economic Analysis |
5 | Computer Network |
5 | Administrative Data |
4 | Switch |
4 | Senior Physician |
4 | Report and Publication of Study |
4 | Quality Assessment |
4 | Patient Transfer |
4 | Medical Report |
4 | General Practitioner |
4 | Execution of Nursing Procedures |
4 | EuroRec |
4 | Diet |
4 | Computed Tomography |
4 | Cluster |
3 | Surgeon |
3 | Study Plan |
3 | Reference Model for the Domain Layer of Hospital Information Systems |
3 | Radiological Examination |
3 | Preparation of an Order |
3 | Patient Discharge and Transfer to Other Institutions |
3 | Pager |
3 | Operation Room |
3 | Operationalization of Methods and Detailed Study Plan |
3 | Network Management |
3 | Medical Discharge and Medical Report Writing |
3 | LOINC |
3 | Information Management Staff |
3 | Disease Management |
3 | Communication Pattern |
3 | Class Diagram |
3 | Access Control |
2 | User Training |
2 | UML Activity Diagram |
2 | Trouble Shooting |
2 | TISS |
2 | The Strategy of Independent Health Banks |
2 | Therapeutic Intervention Scoring System |
2 | Technical Staff |
2 | System of Concepts |
2 | Staff Controlling |
2 | Software Development |
2 | Risk Management |
2 | Radiotherapy |
2 | Radiology Report |
2 | Project Manager |
2 | Pencil |
2 | Patient Identification and Checking for Recurrent |
2 | Orientation |
2 | Organization of Health Care |
2 | Ordering of Drugs |
2 | Nutrition |
2 | Nursing Discharge and Nursing Report Writing |
2 | Mobile Phone |
2 | Mobile Computer |
2 | Migration Path |
2 | Medical Service |
2 | Medical Care Planning |
2 | Material Controlling |
2 | Management of User Accounts |
2 | Management of Medical Devices |
2 | Malcolm Baldrige National Quality Award |
2 | Laptop |
2 | IT Support |
2 | ISO Standard |
2 | ICD10 |
2 | Hospital Staff |
2 | HL7 Reference Information Model |
2 | HIS Operation |
2 | Health Insurance Company |
2 | General Practice |
2 | Fulfillment of the Expectations of Different Stakeholders |
2 | Financial Controlling |
2 | Federated Database Schema |
2 | Execution of Lab Examinations |
2 | Execution of Clinical Trials and Experiments |
2 | Execution of Clinical Trials |
2 | Ethernet |
2 | Digital Dictation System |
2 | Decision Making and Patient Information |
2 | Customizing |
2 | CT |
2 | Blood Bank |
2 | Behavioral Perspective |
2 | Backbone |
2 | Asset Accounting |
2 | Application Portfolio |
2 | Administrative Patient Data |
1 | Written Agreement on the Study Outline |
1 | Wound Treatment |
1 | Wound Care |
1 | Work Sampling |
1 | Work Organization and Time Management |
1 | Working List |
1 | Virtual Server |
1 | User Requirements Analysis |
1 | UMLS |
1 | Ultrasound |
1 | Transmitting Medium |
1 | Token Ring |
1 | Time Measurement |
1 | TIFF |
1 | Therapeutic Service |
1 | Telemicroscopy |
1 | Tagged Image File Format |
1 | Systems Review |
1 | System Analysis |
1 | Study Report |
1 | Study Outline |
1 | Structural Quality Assessment |
1 | Statistical Evaluation of Patient Data |
1 | Star Architecture |
1 | Stability of Application Components |
1 | Software Vendor |
1 | Social History |
1 | SNOMED |
1 | Skin Care |
1 | Simplified Acute Physiology Score |
1 | ShortTerm HIS Planning |
1 | Sheet of Paper |
1 | Service Transition |
1 | Service Strategy |
1 | Service Operation |
1 | Service Design |
1 | Security of Data |
1 | Scheduling and Resource Planning with the Medical Service Unit |
1 | Robot System |
1 | Research Management |
1 | Rehabilitation Center |
1 | Recruitment of Patients |
1 | Quality of Information Management |
1 | Qualitative Content Analysis |
1 | Publishing and Presentation |
1 | Problem Solving |
1 | Problem Management |
1 | Printer Server |
1 | Plan and Organize |
1 | Personal Hygiene |
1 | Personal Digital Assistant |
1 | Performance of Legal Notification Requirements |
1 | Payroll Accounting |
1 | Patient State |
1 | Patient Billing |
1 | Patient Administration Department |
1 | Organizational Perspective |
1 | Organizational Infrastructure |
1 | Oral and Dental Care |
1 | Optical Fiber |
1 | Operation Report |
1 | Operational Management Concept |
1 | Nursing History |
1 | Nursing Classification |
1 | Nursing Care Plan |
1 | Nuclear Medicine Imaging |
1 | Nomenclature |
1 | Network Topology |
1 | Network Monitoring |
1 | MPEG |
1 | Monitor and Evaluate |
1 | Modeling Information Systems |
1 | Memory Stick |
1 | Material and Medication Management |
1 | Magnetic Resonance Imaging |
1 | Logical Observation Identifier Names and Codes |
1 | Laundry Management |
1 | Knowledge Retrieval and Literature Management |
1 | KHStatV |
1 | KHG |
1 | JPEG |
1 | Joint Photographic Experts Group |
1 | IuKDG |
1 | IT Evaluation Study |
1 | ISO 14721 |
1 | International DICOM Committee |
1 | Internal Reporting System |
1 | Integrated Care |
1 | Information Desk |
1 | Incident Taking |
1 | Incident Management |
1 | Incident Analysis |
1 | IHE Technical Framework |
1 | ICPM |
1 | Human Resources Department |
1 | Home of the Patient |
1 | HL7 Version 2 |
1 | HL7 RIM |
1 | HIS Component |
1 | HIPAA |
1 | Hair und Nail Care |
1 | German Law |
1 | General Administration Department |
1 | Functional Perspective |
1 | Fork Lift |
1 | Financing of Health Care |
1 | Field Study |
1 | Family History |
1 | Failure Management |
1 | eXtensible Markup Language |
1 | Execution of Prophylaxis |
1 | Execution of Irradiation |
1 | Excretion |
1 | Evaluation Method |
1 | Establish and Promote the Strategic Information Management Plan |
1 | Encryption Software |
1 | Drug Service |
1 | Diagnostic Service |
1 | Department of Quality Management |
1 | Department of Facility Management |
1 | Deliver and Support |
1 | Debtor Accounting |
1 | Data Reference Model |
1 | Copper Cable |
1 | Continual Service Improvement |
1 | Consultant |
1 | Consensus Method |
1 | ComputerBased Information Processing |
1 | Complaint |
1 | Communication Protocol |
1 | Communication Ability |
1 | Coding of Procedures |
1 | Clinical Department |
1 | Clinical Data Warehouse |
1 | Chemotherapeutic Treatment |
1 | Change Management |
1 | CEN |
1 | Catering |
1 | Business Intelligence System |
1 | Body Washing |
1 | ATC |
1 | ASCII |
1 | American Standard Code for Information Interchange |
1 | Admission Diagnosis |
1 | Administration of Human Resource Master Data |
1 | Administration of Business Trips and Further Training |
1 | Administration Management |
1 | Administration and Allocation of Patient Records |
1 | Acquire and Implement |
1 | Access Control System |
Die manuelle Extraktion liefert maximal 2 Seiten pro Klasse, bei der automatisierten gibt es Klassen mit mehreren hundert Vorkommen. Daher ist die Frage ob es sinnvoll ist, einen Cutoff zu setzen und wenn ja wo.
cutoff is now at 3, that means that classes which appear more than 3 times are not written into the file. Last problem now is, that the offset between pdf manuscript and the 2nd edition book varies and so the pages are not correct yet. We need a pdf from the 2nd edition book to extract correctly
algorithm now converts to normalized unicode, sorted results are attached here as .nt file @KonradHoeffner please put them into the SPARQL endpoint outputsorted.nt.txt
Upload done
See also: https://github.com/IMISE/snik-ontology/issues/296, https://github.com/IMISE/snik-ontology/issues/309.
Despite efforts to restore missing chapters, there are still several hundred classes in the blue book without page, partly without chapter too. All classes with a page now have a chapter because we automatically generated those, see IMISE/snik-cytoscape.js#214.
However for those that don't have a page and maybe also don't have a chapter, we should extract those from the blue book. Using the digital version and using search it should be doable semi-automatically and it would be a huge help for the chapter search feature for teaching.