titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
580 stars 166 forks source link

Question about downloading PubMed OA figures #74

Closed pidugusundeep closed 4 years ago

pidugusundeep commented 4 years ago

After running the Parse PubMed OA images and captions, I would like to understand where I can get the actual figure to download with the fig_id or graphic_ref attached in the nxml document.

titipata commented 4 years ago

@pidugusundeep I tag @daniel-acuna here. Do you know where can we download the dataset?

daniel-acuna commented 4 years ago

Search Pubmed Open Access FTP

pidugusundeep commented 4 years ago

I was able to download the entire 'txt', but I need to get the source images for all of them, Please provide me with the sample. @daniel-acuna

daniel-acuna commented 4 years ago

For example, if ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/00/00/PMC3363814.tar.gz is uncompressed, you will find the figures associated with the paper. All the paths are available in here ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_file_list.csv

titipata commented 4 years ago

Nice, thank you so much Daniel. I'll put the documentation in the repo before closing this issue.

titipata commented 4 years ago

I put instructions on how to download figures here. For someone who see this issue, see more details on Wiki page.