Open ShweataNHegde opened 4 years ago
Hello, I tried running text.ipynb (https://github.com/petermr/ami3/blob/master/src/ipynb/text.ipynb) on Windows 10 Home. But the path of the files that gets printed has double backslashes, instead of one. Details below.
text.ipynb
When I run the following cell, the path of the files that gets printed has double backslashes, instead of one.
project = 'C:/Users/shweata/ami3/src/test/resources/org/contentmine/ami/zika10' # os.chdir(project) file_glob = 'PMC*' files = get_globbed_files(project, file_glob) print("number of " + file_glob + " files: " + str(len(files)) + "\n " + str(files)) print("file type " + str(type(files))) abstract_files = get_globbed_files(project, 'PMC*/sections/abstract/*.xml') print("abstracts " + str(abstract_files)) text_files = get_globbed_files(project, 'PMC*/sections/**/*.xml', recursive=False) print("number of xml text files: " + str(len(text_files)) +"\n" + str(text_files)) figure_files = get_globbed_files(project, 'PMC*/sections/**/*figure*.xml', recursive=False) # print("number of figure files: " + str(len(figure_files)) +"\n" + str(figure_files))
The output:
bstracts ['PMC3113902\\sections\\abstract\\elem_0.xml', 'PMC320490\\sections\\abstract\\background__4_0.xml', 'PMC3289602\\sections\\abstract\\author_summary_1.xml', 'PMC3289602\\sections\\abstract\\background__3_0.xml', 'PMC3310194\\sections\\abstract\\elem_0.xml', 'PMC3310457\\sections\\abstract\\elem_0.xml', 'PMC3310457\\sections\\abstract\\elem_1.xml', 'PMC3310660\\sections\\abstract\\elem_0.xml', 'PMC3321795\\sections\\abstract\\elem_0.xml', 'PMC3321797\\sections\\abstract\\elem_0.xml'] number of xml text files: 141 ['PMC3113902\\sections\\2_back\\0_ack.xml', 'PMC3113902\\sections\\abstract\\elem_0.xml', 'PMC3113902\\sections\\article\\elem_0.xml', 'PMC3113902\\sections\\figures\\figure_1.xml', 'PMC3113902\\sections\\figures\\figure_2.xml', 'PMC320490\\sections\\2_back\\0_ack.xml', 'PMC320490\\sections\\3_floats-group\\0_figure_1.xml', 'PMC320490\\sections\\3_floats-group\\1_figure_2.xml', 'PMC320490\\sections\\3_floats-group\\2_table_1.xml', 'PMC320490\\sections\\3_floats-group\\3_table_2.xml', 'PMC320490\\sections\\3_floats-group\\4_figure_3.xml', 'PMC320490\\sections\\3_floats-group\\5_figure_4.xml', 'PMC320490\\sections\\3_floats-group\\6_figure_5.xml', 'PMC320490\\sections\\abstract\\background__4_0.xml', 'PMC320490\\sections\\article\\elem_0.xml', 'PMC320490\\sections\\figures\\figure_1.xml', 'PMC320490\\sections\\figures\\figure_2.xml', 'PMC320490\\sections\\figures\\figure_3.xml', 'PMC320490\\sections\\figures\\figure_4.xml', 'PMC320490\\sections\\figures\\figure_5.xml', 'PMC320490\\sections\\tables\\table_1.xml', 'PMC320490\\sections\\tables\\table_2.xml', 'PMC3289602\\sections\\0_introduction\\0_title.xml', 'PMC3289602\\sections\\0_introduction\\1_p.xml', 'PMC3289602\\sections\\0_introduction\\2_p.xml', 'PMC3289602\\sections\\0_introduction\\3_p.xml', 'PMC3289602\\sections\\0_introduction\\4_p.xml', 'PMC3289602\\sections\\0_introduction\\5_p.xml', 'PMC3289602\\sections\\1_methods\\0_title.xml', 'PMC3289602\\sections\\2_back\\0_fn-group.xml', 'PMC3289602\\sections\\2_results\\0_title.xml', 'PMC3289602\\sections\\3_discussion\\0_title.xml', 'PMC3289602\\sections\\3_floats-group\\0_table_1.xml', 'PMC3289602\\sections\\3_floats-group\\1_table_2.xml', 'PMC3289602\\sections\\3_floats-group\\2_figure_1.xml', 'PMC3289602\\sections\\3_floats-group\\3_table_3.xml', 'PMC3289602\\sections\\3_floats-group\\4_figure_2.xml', 'PMC3289602\\sections\\4_floats-group\\0_table_1.xml', 'PMC3289602\\sections\\4_floats-group\\1_table_2.xml', 'PMC3289602\\sections\\4_floats-group\\2_figure_1.xml', 'PMC3289602\\sections\\4_floats-group\\3_table_3.xml', 'PMC3289602\\sections\\4_floats-group\\4_figure_2.xml', 'PMC3289602\\sections\\abstract\\author_summary_1.xml', 'PMC3289602\\sections\\abstract\\background__3_0.xml', 'PMC3289602\\sections\\acknowledge\\elem_0.xml', 'PMC3289602\\sections\\article\\elem_0.xml', 'PMC3289602\\sections\\figures\\figure_1.xml', 'PMC3289602\\sections\\figures\\figure_2.xml', 'PMC3289602\\sections\\methods\\methods__4_0.xml', 'PMC3289602\\sections\\tables\\table_1.xml', 'PMC3289602\\sections\\tables\\table_2.xml', 'PMC3289602\\sections\\tables\\table_3.xml', 'PMC3310194\\sections\\2_back\\0_ack.xml', 'PMC3310194\\sections\\2_back\\2_app-group.xml', 'PMC3310194\\sections\\3_floats-group\\0_table-wrap.xml', 'PMC3310194\\sections\\3_floats-group\\10_figure_10_.xml', 'PMC3310194\\sections\\3_floats-group\\11_figure_11_.xml', 'PMC3310194\\sections\\3_floats-group\\12_figure_12_.xml', 'PMC3310194\\sections\\3_floats-group\\13_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\14_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\15_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\16_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\17_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\18_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\19_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\1_figure_1_.xml', 'PMC3310194\\sections\\3_floats-group\\20_supplementary-material.xml', 'PMC3310194\\sections\\3_floats-group\\21_appendix_figure_1_.xml', 'PMC3310194\\sections\\3_floats-group\\22_appendix_figure_2_.xml', 'PMC3310194\\sections\\3_floats-group\\23_appendix_figure_3_.xml', 'PMC3310194\\sections\\3_floats-group\\2_figure_2_.xml', 'PMC3310194\\sections\\3_floats-group\\3_figure_3_.xml', 'PMC3310194\\sections\\3_floats-group\\4_figure_4_.xml', 'PMC3310194\\sections\\3_floats-group\\5_figure_5_.xml', 'PMC3310194\\sections\\3_floats-group\\6_figure_6_.xml', 'PMC3310194\\sections\\3_floats-group\\7_figure_7_.xml', 'PMC3310194\\sections\\3_floats-group\\8_figure_8_.xml', 'PMC3310194\\sections\\3_floats-group\\9_figure_9_.xml',
(Truncated) When I try running the subsequent cell,
text_contents = [] for text_file in text_files: text_filex = open(text_file,mode='r') text = text_filex.read() text_filex.close() text_contents.append(text) len(text_contents) # text_contents
I get the following error.
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) <ipython-input-4-43f8fb121a67> in <module> 1 text_contents = [] 2 for text_file in text_files: ----> 3 text_filex = open(text_file,mode='r') 4 text = text_filex.read() 5 text_filex.close() FileNotFoundError: [Errno 2] No such file or directory: 'PMC3113902\\sections\\2_back\\0_ack.xml'
I looked it up online for help, and this is what I found. (https://lerner.co.il/2018/07/24/avoiding-windows-backslash-problems-with-pythons-raw-strings/ ). I tried the solutions suggested in this article, but that didn't help. I have very little experience with programming and, any help regarding this would be appreciated.
I will ask on Shuttleworth Slack
I think I should be using Path... Just a guess at present.
Hello, I tried running
text.ipynb
(https://github.com/petermr/ami3/blob/master/src/ipynb/text.ipynb) on Windows 10 Home. But the path of the files that gets printed has double backslashes, instead of one. Details below.When I run the following cell, the path of the files that gets printed has double backslashes, instead of one.
The output:
(Truncated)
When I try running the subsequent cell,
I get the following error.
I looked it up online for help, and this is what I found. (https://lerner.co.il/2018/07/24/avoiding-windows-backslash-problems-with-pythons-raw-strings/ ). I tried the solutions suggested in this article, but that didn't help.
I have very little experience with programming and, any help regarding this would be appreciated.