petermr / openDiagram

Extaction of semantic data from diagrams in scientific and other technical/business documents
Apache License 2.0
1 stars 5 forks source link

search_lib issue for plantpart corpus #14

Open dheerajdhingani opened 3 years ago

dheerajdhingani commented 3 years ago

hello sir Screenshot (15)

I was tried search lib on plant part. it gives plot with this error.

C:\Users\Mark\openDiagram\physchem\python>python search_lib.py --dict plant_part --sect introduction method --proj plantpart
running search main
project files not available for  C:\Users\Mark\openDiagram\python\diagrams\satish\cct
project files not available for  C:\Users\Mark\openDiagram\python\diagrams\rahul\diffprotexp
project files not available for  C:\Users\Mark\worcester\synthesis
project files not available for  C:\Users\Mark\worcester\explosion
Failed to read dictionary C:\Users\Mark\CEVOpen\dictionary\eoCompound\plant_compound.xml Start tag expected, '<' not found, line 1, column 1 (file:/C:/Users/Mark/CEVOpen/dictionary/eoCompound/plant_compound.xml, line 1)
Failed to read dictionary C:\Users\Mark\CEVOpen\dictionary\eoPlant\Plant.xml Opening and ending tag mismatch: entry line 91 and dictionary, line 2399, column 14 (file:/C:/Users/Mark/CEVOpen/dictionary/eoPlant/Plant.xml, line 2399)
Failed to read dictionary C:\Users\Mark\CEVOpen\dictionary\eoCompound\plant_compound.xml Start tag expected, '<' not found, line 1, column 1 (file:/C:/Users/Mark/CEVOpen/dictionary/eoCompound/plant_compound.xml, line 1)
core dicts dict_keys(['activity', 'country', 'disease', 'plant_genus', 'organization', 'plant_part', 'invasive_plant'])
commandline args
dicts ['plant_part'] <class 'list'>
sects ['introduction', 'method'] <class 'list'>
projs ['plantpart'] <class 'list'>
patterns None <class 'NoneType'>
args> Namespace(dict=['plant_part'], sect=['introduction', 'method'], proj=['plantpart'], patt=None, demo=None, loglevel='foo', plot=True, nosearch=False, maxbars=25, languages=['en'])
name plant_part
***** project C:\Users\Mark\CEVOpen\minicorpora\plantpart
_DESC <class 'str'> introduction or background; looks for these words anywhere in file titles
PROJ <class 'str'> C:\Users\Mark\CEVOpen\minicorpora\plantpart
TREE <class 'str'> *
SECTS <class 'str'> **
SUBSECT <class 'str'> *introduction*
SUBSUB <class 'str'> **
FILE <class 'str'> *
SUFFIX <class 'str'> xml
glob C:\Users\Mark\CEVOpen\minicorpora\plantpart/*/**/*introduction*/**/*.xml
_DESC <class 'str'> introduction or background; looks for these words anywhere in file titles
PROJ <class 'str'> C:\Users\Mark\CEVOpen\minicorpora\plantpart
TREE <class 'str'> *
SECTS <class 'str'> **
SUBSECT <class 'str'> *background*
SUBSUB <class 'str'> **
FILE <class 'str'> *
SUFFIX <class 'str'> xml
glob C:\Users\Mark\CEVOpen\minicorpora\plantpart/*/**/*background*/**/*.xml
files 1216
***** section_files introduction 1216
file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\0_1__introduction\0_title.xml
findfont: Font family ['Helvetica'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Helvetica'] not found. Falling back to DejaVu Sans.
lang: en
 [('seed', 82), ('fruit', 69), ('herb', 30), ('leaf', 30), ('root', 22), ('flowering', 19), ('rhizome', 19), ('heartwood', 17), ('peel', 17), ('flower', 12), ('berry', 9), ('bark', 9), ('wood', 8), ('resin', 6), ('shoot', 5), ('trichomes', 5), ('xylem', 4), ('grass', 4), ('panicle', 4), ('calyx', 4), ('epidermis', 4), ('ovary', 3), ('petiole', 2), ('stigma', 2), ('corona', 2), ('placenta', 2), ('foliage', 1), ('rosette', 1), ('branch', 1), ('pericarp', 1), ('pistil', 1), ('androecium', 1), ('stamen', 1), ('pollen', 1), ('cone', 1), ('tuber', 1)]
_DESC <class 'str'> methods and/or materials; looks for these words anywhere in file titles
PROJ <class 'str'> C:\Users\Mark\CEVOpen\minicorpora\plantpart
TREE <class 'str'> *
SECTS <class 'str'> **
SUBSECT <class 'str'> *method*
SUBSUB <class 'str'> **
FILE <class 'str'> *p
SUFFIX <class 'str'> xml
glob C:\Users\Mark\CEVOpen\minicorpora\plantpart/*/**/*method*/**/*p.xml
_DESC <class 'str'> methods and/or materials; looks for these words anywhere in file titles
PROJ <class 'str'> C:\Users\Mark\CEVOpen\minicorpora\plantpart
TREE <class 'str'> *
SECTS <class 'str'> **
SUBSECT <class 'str'> *material*
SUBSUB <class 'str'> **
FILE <class 'str'> *p
SUFFIX <class 'str'> xml
glob C:\Users\Mark\CEVOpen\minicorpora\plantpart/*/**/*material*/**/*p.xml
files 2999
***** section_files method 2999
file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\1_3_1__intact_plants\1_p.xml
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\1_3_1__intact_plants\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\2_3_2__in_vitro_shoot_cultu\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\2_3_2__in_vitro_shoot_cultu\2_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\3_3_3__isolation_and_the_an\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\3_3_3__isolation_and_the_an\2_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\3_3_3__isolation_and_the_an\3_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\3_3_3__isolation_and_the_an\4_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\4_3_4__the_antimicrobial_as\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\5_3_5__preparing_of_the_ess\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\6_3_6__microorganisms\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\6_3_6__microorganisms\2_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\6_3_6__microorganisms\3_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\6_3_6__microorganisms\4_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\7_3_7__evaluation_of_minima\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7238414\sections\1_body\2_3__materials_and_methods_\7_3_7__evaluation_of_minima\2_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7353542\sections\1_body\1_2__materials_and_methods\1_2_1__plant_materials_and_\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7353542\sections\1_body\1_2__materials_and_methods\2_2_2__essential_oil_charac\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7353542\sections\1_body\1_2__materials_and_methods\3_2_3__insects\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7353542\sections\1_body\1_2__materials_and_methods\3_2_3__insects\2_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7353542\sections\1_body\1_2__materials_and_methods\4_2_4__fumigant_toxicity\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7353542\sections\1_body\1_2__materials_and_methods\5_2_5__contact_toxicity\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7353542\sections\1_body\1_2__materials_and_methods\6_2_6__data_analysis\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7353542\sections\1_body\1_2__materials_and_methods\6_2_6__data_analysis\2_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\1_2_1__plant_material\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\2_2_2__extraction_of_essent\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\3_2_3__gc_ms_analysis\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\3_2_3__gc_ms_analysis\2_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\4_2_4__antioxidant_activity\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\4_2_4__antioxidant_activity\2_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\4_2_4__antioxidant_activity\3_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\4_2_4__antioxidant_activity\4_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\5_2_5__antibacterial_activi\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\5_2_5__antibacterial_activi\2_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\5_2_5__antibacterial_activi\3_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7360168\sections\1_body\1_2__materials_and_methods\5_2_5__antibacterial_activi\4_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7404404\sections\1_body\1_2__materials_and_methods\1_2_1__plant_materials\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7404404\sections\1_body\1_2__materials_and_methods\2_2_2__gas_chromatographic–\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7404404\sections\1_body\1_2__materials_and_methods\3_2_3__insecticidal_activit\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7413255\sections\1_body\3_materials_and_methods\1_insect_colony\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7413255\sections\1_body\3_materials_and_methods\2_plant_materials\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7413255\sections\1_body\3_materials_and_methods\3_essential_oil_extraction\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7413255\sections\1_body\3_materials_and_methods\4_fumigation_bioassay\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7413255\sections\1_body\3_materials_and_methods\5_the_embryo_exposure_to_es\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7413255\sections\1_body\3_materials_and_methods\6_copulation_test\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7413255\sections\1_body\3_materials_and_methods\7_statistical_analysis\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7438546\sections\1_body\1_materials_and_methods\1_study_site\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7438546\sections\1_body\1_materials_and_methods\2_experimental_design\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7438546\sections\1_body\1_materials_and_methods\3_field_sampling_and_measur\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7438546\sections\1_body\1_materials_and_methods\3_field_sampling_and_measur\2_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7438546\sections\1_body\1_materials_and_methods\3_field_sampling_and_measur\3_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7438546\sections\1_body\1_materials_and_methods\4_laboratory_determination\1_nscs_and_lipids\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7438546\sections\1_body\1_materials_and_methods\4_laboratory_determination\2_relative_moisture_content\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7438546\sections\1_body\1_materials_and_methods\4_laboratory_determination\3_extraction_and_component_\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7438546\sections\1_body\1_materials_and_methods\4_laboratory_determination\3_extraction_and_component_\2_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7438546\sections\1_body\1_materials_and_methods\5_statistical_analysis\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7445301\sections\1_body\3_methodology\10_histopathology_alteration\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7445301\sections\1_body\3_methodology\11_statistical_analysis\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7445301\sections\1_body\3_methodology\1_chemical_composition_of_c\1_p.txt
wrote sentence file C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7445301\sections\1_body\3_methodology\2_preparation_and_encapsula\1_p.txt
error reading:  C:\Users\Mark\CEVOpen\minicorpora\plantpart\PMC7445301\sections\1_body\3_methodology\2_preparation_and_encapsula\2_p.xml 'charmap' codec can't decode byte 0x90 in position 1160: character maps to <undefined>
Traceback (most recent call last):
  File "C:\Users\Mark\openDiagram\physchem\python\search_lib.py", line 1017, in <module>
    main()
  File "C:\Users\Mark\openDiagram\physchem\python\search_lib.py", line 954, in main
    ami_search.run_search()
  File "C:\Users\Mark\openDiagram\physchem\python\search_lib.py", line 357, in run_search
    self.find_files_search_plot(proj, section_type)
  File "C:\Users\Mark\openDiagram\physchem\python\search_lib.py", line 364, in find_files_search_plot
    counter_dict, pattern_dict = self.search_and_count(section_files)
  File "C:\Users\Mark\openDiagram\physchem\python\search_lib.py", line 311, in search_and_count
    matches_by_amidict, matches_by_pattern = self.search(target_file)
  File "C:\Users\Mark\openDiagram\physchem\python\search_lib.py", line 213, in search
    words = TextUtil.get_words_in_section(file)
  File "C:\Users\Mark\openDiagram\physchem\python\text_lib.py", line 447, in get_words_in_section
    section.read_file(file)
  File "C:\Users\Mark\openDiagram\physchem\python\text_lib.py", line 249, in read_file
    raise ex
  File "C:\Users\Mark\openDiagram\physchem\python\text_lib.py", line 246, in read_file
    self.xml = f.read()
  File "C:\Users\Mark\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 1160: character maps to <undefined>