Closed EmanuelFaria closed 4 years ago
Update: It doesn't work on every article. Example: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788217/table/Tab2/ doesn't work.
Good idea, but as you note , it's not universal. It depends on the publisher - some tables are embedded in the article, some are separate, some are both.
Are you available in 1 hour? 1615 UTC/GMT? Like to discuss extracting activity tables.
P.
On Wed, Dec 11, 2019 at 2:55 PM Emanuel Faria notifications@github.com wrote:
Update: It doesn't work on every article. Example: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788217/table/Tab2/ doesn't work.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/68?email_source=notifications&email_token=AAFTCS3X62G43J5TQ2H37H3QYD5O5A5CNFSM4JZQH4B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGTNO2I#issuecomment-564582249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSY4EHX7K26HMMPYZR3QYD5O5ANCNFSM4JZQH4BQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I'm available. Call when ready
For instance, the example I used above that DIDN'T follow the table-only URL structure I thought I'd discovered https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788217/table/Tab2/ DOES work, in the following manner....
This article — https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788217/ — just as in all articles I tested so far, has hyperlinks that I expected to be page anchors (i.e. "(Table 1)", "(Table 2)" ... etc.). It turns out that clicking those links open a new window displaying just the table.
Antibacterial properties of ZEO and SNP The inhibitory effects of ZEO and SNP alone and in combination against Staph. aureus and Salm. Typhimurium were investigated using microtiter plate assay. For Staph. aureus, the MIC values of ZEO and SNP were 1250 and 25 μg/mL and for Salm. Typhimurium the values were 2500 and 25 μg/mL, respectively. In all cases, MBC values were similar to MICs. The ZEO was found to be more effective on gram-positive than gram-negative bacteria whereas SNP displayed similar antibacterial activity on both bacteria. The MICs for SNP - ZEO combination were 0.78 and 12.5 μg/ mL against Staph. aureus and Salm. Typhimurium, respectively. ZEO-SNP combination inhibited S. aureus and Salm. Typhimurium at 625 μg/ mL. Based on the FICI scale (Table 2), the combination displayed a synergistic action on Staph. aureus (FICI=0.81) and Salm. Typhimurium (FICI= 0.75).
On further inspection, it seems the URL leading to these table-only pages are built on the original article URL
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5871294/table/table2-2156587217717414/
followed by: /table/table[table#]-[article doi]/
OK.... I found one more thing that gets us to an even cleaner table....
Here are the steps I followed to what seems to be the solution:
Clicking the text above Table 3 in this link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5871294/ got me to this simpler table page: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5871294/table/table3-2156587217717414/
When I noticed the "Open in a separate window" text at the bottom, I clicked it and ended up with nothing on the page but a nice clean table, here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5871294/table/table3-2156587217717414/?report=objectonly
So if I'm right, the URL to any cleanly extractable table should be:
As a test, I changed "table3" to "table4" in this URL, and got exactly what I expected... even though there was no "Open in a separate window" text on the "(Table 4)" hyperlinked page in the article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5871294/table/table3-2156587217717414/?report=objectonly https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5871294/table/table4-2156587217717414/?report=objectonly
The tables were downloaded in XML but for some reason were not extracted. File this as an issue. We can only work with automatic downloads - not manual.
P.
On Fri, Dec 13, 2019 at 5:54 PM Emanuel Faria notifications@github.com wrote:
OK.... I found one more thing that gets us to an even cleaner table....
Here are the steps I followed to what seems to be the solution:
1.
Clicking the text above Table 3 in this link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5871294/ got me to this simpler table page:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5871294/table/table3-2156587217717414/ 2.
When I noticed the "Open in a separate window" text at the bottom, I clicked it and ended up with nothing on the page but a nice clean table, here:
So if I'm right, the URL to any cleanly extractable table should be:
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/68?email_source=notifications&email_token=AAFTCS3ZN3BYDCIEEJTA7RLQYPD45A5CNFSM4JZQH4B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG2XR4Y#issuecomment-565541107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4HXC434LK7XIRDVFLQYPD45ANCNFSM4JZQH4BQ .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I don't know if there are XML versions of pages with just the tables. If so, we could scrape our regularly download XMLs for those table links, convert them to new URLs, and then download and scrape the simplified pages ...
If you could post a few sample XML page URLs, I could poke around a bit with it. (Assuming the end result would somehow make things easier/cleaner to work with).
I'm not certain, but It may be useful to extract tables from each article's unique table-only URL by replacing the article ID and Table number in the ncbi URL format (as below).
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5497343/table/Tab3/
https://www.ncbi.nlm.nih.gov/pmc/articles/ [ARTICLE_ID] /table/Tab [TABLE_X:X+1] /
What do you think @petermr ? Does this help in any way?