Closed 90000988 closed 3 months ago
Hello,
the URL is working, it just needs a PMCID, e.g.
https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?tool=roco-fetch&email=your.email@example.com&id=PMC4608653
It returns an XML response which includes the FTP link to the article, which we then use to download it.
If you are writing your own script to download articles from PMC, please make sure to use your own e-mail address in the link instead of mine.
For more information, please refer to the documentation:
Thank you for your response. Bro Could you do me a favor? Please. I am going to do a project related to medical domain, if you don't mind guide me how to download these medical images, sem type, cuis, etc from PubMed, it was my first project,i didn't have any idea,until u send those urls, i really appreciate it. Please bro. Help me how to download these data. Thank you in advance.
On Sun, 24 Mar, 2024, 11:16 am saviola777, @.***> wrote:
Hello,
the URL is working, it just needs a PMCID, e.g.
@.***&id=PMC4608653
It returns an XML response which includes the FTP link to the article, which we then use to download it.
If you are writing your own script to download articles from PMC, please make sure to use your own e-mail address in the link instead of mine.
For more information, please refer to the documentation:
- Download through AWS https://www.ncbi.nlm.nih.gov/pmc/tools/pmcaws/
- Download through FTP https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/
- OA Web Service API https://www.ncbi.nlm.nih.gov/pmc/tools/oa-service/ to discover resources (e.g., new articles added since date X)
— Reply to this email directly, view it on GitHub https://github.com/razorx89/roco-dataset/issues/18#issuecomment-2016775034, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHIRUESEJ5VEDIJDHR5NMPTYZ2YYLAVCNFSM6AAAAABFFBNFZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJWG43TKMBTGQ . You are receiving this because you authored the thread.Message ID: @.***>
I'm not sure what exactly you are trying to do. If you just need the dataset, you can check out ROCOv2.
Bro, I am going to do a project which is different from this, I need these data from PubMed, could you please guide me how to search and get this data set from PubMed please🙏.
On Sun, 24 Mar, 2024, 9:26 pm saviola777, @.***> wrote:
I'm not sure what exactly you are trying to do. If you just need the dataset, you can check out ROCOv2 https://zenodo.org/records/8333645.
— Reply to this email directly, view it on GitHub https://github.com/razorx89/roco-dataset/issues/18#issuecomment-2016948384, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHIRUEXDM3Q6ULYLSE5H4XLYZ5AGVAVCNFSM6AAAAABFFBNFZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJWHE2DQMZYGQ . You are receiving this because you authored the thread.Message ID: @.***>
The only things you can get directly from PubMed are the articles, images, and image captions.
Good luck.
Thanks for the tip!
On Tue, 26 Mar, 2024, 11:05 am saviola777, @.***> wrote:
The only things you can get directly from PubMed are the articles, images, and image captions.
- first you download the archives from the FTP / AWS
- then you extract the images
- then you extract the captions for the images from the NXML file
- then you need to classify the images to keep only non-compound radiological images https://gitlab.com/saviola/rocov2-code/-/tree/main/roco-2018
- then you need to clean the captions
- then you need to extract CUIs from the captions using something like MedCAT https://github.com/CogStack/MedCAT
- then you need to filter CUIs otherwise you get lots of nonsense
Good luck.
— Reply to this email directly, view it on GitHub https://github.com/razorx89/roco-dataset/issues/18#issuecomment-2020132501, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHIRUEW4L426MR26CSK7IX3Y2FI77AVCNFSM6AAAAABFFBNFZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRQGEZTENJQGE . You are receiving this because you authored the thread.Message ID: @.***>
Bro, thanks for your reply, could I ask one more thing, when I tried to download the images from PubMed, articles are downloaded, it doesn't have images. Could you guide me please. Thanks in advance.
On Tue, 26 Mar, 2024, 11:05 am saviola777, @.***> wrote:
The only things you can get directly from PubMed are the articles, images, and image captions.
- first you download the archives from the FTP / AWS
- then you extract the images
- then you extract the captions for the images from the NXML file
- then you need to classify the images to keep only non-compound radiological images https://gitlab.com/saviola/rocov2-code/-/tree/main/roco-2018
- then you need to clean the captions
- then you need to extract CUIs from the captions using something like MedCAT https://github.com/CogStack/MedCAT
- then you need to filter CUIs otherwise you get lots of nonsense
Good luck.
— Reply to this email directly, view it on GitHub https://github.com/razorx89/roco-dataset/issues/18#issuecomment-2020132501, or unsubscribe https://github.com/notifications/unsubscribe-auth/BHIRUEW4L426MR26CSK7IX3Y2FI77AVCNFSM6AAAAABFFBNFZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRQGEZTENJQGE . You are receiving this because you authored the thread.Message ID: @.***>
I guess you are downloading the wrong archives.
$ wget -r ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/96/5a/PMC1660580.tar.gz […] 2024-03-27 11:42:12 (1,20 MB/s) - ‘ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/96/5a/PMC1660580.tar.gz’ saved [701386] $ tar xvf ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/96/5a/PMC1660580.tar.gz
PMC1660580/ PMC1660580/1746-160X-2-40-6.jpg PMC1660580/1746-160X-2-40-2.gif PMC1660580/1746-160X-2-40-3.jpg PMC1660580/1746-160X-2-40-6.gif PMC1660580/1746-160X-2-40-8.jpg PMC1660580/1746-160X-2-40-7.jpg PMC1660580/1746-160X-2-40-8.gif PMC1660580/1746-160X-2-40-4.gif PMC1660580/1746-160X-2-40-4.jpg PMC1660580/1746-160X-2-40-1.jpg PMC1660580/1746-160X-2-40-3.gif PMC1660580/1746-160X-2-40-5.gif PMC1660580/1746-160X-2-40-5.jpg PMC1660580/1746-160X-2-40-2.jpg PMC1660580/1746-160X-2-40.pdf PMC1660580/1746-160X-2-40-7.gif PMC1660580/1746-160X-2-40-1.gif PMC1660580/1746-160X-2-40.nxml
All images and nxml are there.
Could you please lemme know how did you downloaded the data from PubMed. And why this url is not workinghttps://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?tool=roco-fetch&email=johannes.rueckert@fh-dortmund.de&id=. Thank you.