lagotto / alm-report

ALM Reports
http://almreports.plos.org/
MIT License
8 stars 3 forks source link

Faceted search #48

Closed jure closed 9 years ago

jure commented 10 years ago

By switching to the CrossRef API, which is a bit more restrictive about which fields you can search through directly, we have a good opportunity to introduce faceted search.

These are the facets returned for a simple "biology" search (http://api.crossref.org/works?query=biology&facet=t), for example:

{
    "facets": {
        "license": {
            "value-count": 10,
            "values": {
                "http://www.elsevier.com/tdm/userlicense/1.0/": 16560,
                "http://creativecommons.org/licenses/by/3.0/": 999,
                "http://pubs.acs.org/page/policy/authorchoice_termsofuse.html": 148,
                "http://pubs.acs.org/userimages/ContentEditor/1388526979973/authorchoice_form.pdf": 29,
                "http://www.elsevier.com/open-access/userlicense/1.0/": 12,
                "http://pubs.acs.org/page/policy/authorchoice_ccby_termsofuse.html": 10,
                "http://creativecommons.org/licenses/by/4.0/": 6,
                "http://creativecommons.org/licenses/by-nc/3.0": 2,
                "http://www.acs.org/content/acs/en/copyright.html": 1,
                "http://pubs.acs.org/page/policy/authorchoice_ccbyncnd_termsofuse.html": 1
            }
        },
        "archive": {
            "value-count": 1,
            "values": {
                "DWT": 38
            }
        },
        "funder-doi": {
            "value-count": 20,
            "values": {
                "http://dx.doi.org/10.13039/100000002": 343,
                "http://dx.doi.org/10.13039/100000001": 230,
                "http://dx.doi.org/10.13039/501100001809": 115,
                "http://dx.doi.org/10.13039/501100001659": 58,
                "http://dx.doi.org/10.13039/100000054": 55,
                "http://dx.doi.org/10.13039/501100001711": 52,
                "http://dx.doi.org/10.13039/501100000038": 51,
                "http://dx.doi.org/10.13039/100000057": 50,
                "http://dx.doi.org/10.13039/501100000781": 41,
                "http://dx.doi.org/10.13039/100000015": 41,
                "http://dx.doi.org/10.13039/501100000923": 30,
                "http://dx.doi.org/10.13039/100004410": 29,
                "http://dx.doi.org/10.13039/501100000270": 28,
                "http://dx.doi.org/10.13039/100004440": 28,
                "http://dx.doi.org/10.13039/501100001665": 27,
                "http://dx.doi.org/10.13039/501100001691": 26,
                "http://dx.doi.org/10.13039/501100000024": 26,
                "http://dx.doi.org/10.13039/100000060": 26,
                "http://dx.doi.org/10.13039/501100000780": 25,
                "http://dx.doi.org/10.13039/100000011": 20
            }
        },
        "issn": {
            "value-count": 20,
            "values": {
                "http://id.crossref.org/issn/0360-3016": 50457,
                "http://id.crossref.org/issn/1535-3702": 45042,
                "http://id.crossref.org/issn/1535-3699": 45042,
                "http://id.crossref.org/issn/0022-2836": 32735,
                "http://id.crossref.org/issn/1539-7718": 32714,
                "http://id.crossref.org/issn/0033-5770": 32714,
                "http://id.crossref.org/issn/0065-2598": 29023,
                "http://id.crossref.org/issn/0021-9525": 24648,
                "http://id.crossref.org/issn/0007-4888": 23329,
                "http://id.crossref.org/issn/1540-8140": 21939,
                "http://id.crossref.org/issn/1573-8221": 21897,
                "http://id.crossref.org/issn/0012-1606": 20103,
                "http://id.crossref.org/issn/1064-3745": 19611,
                "http://id.crossref.org/issn/1940-6029": 19108,
                "http://id.crossref.org/issn/0002-7685": 16267,
                "http://id.crossref.org/issn/1938-4211": 16209,
                "http://id.crossref.org/issn/0960-9822": 14795,
                "http://id.crossref.org/issn/0891-5849": 14511,
                "http://id.crossref.org/issn/0022-5193": 14298,
                "http://id.crossref.org/issn/0006-3363": 13131
            }
        },
        "funder-name": {
            "value-count": 20,
            "values": {
                "National Institutes of Health": 599,
                "National Science Foundation": 316,
                "National Natural Science Foundation of China": 282,
                "NIH": 168,
                "Deutsche Forschungsgemeinschaft": 106,
                "National Cancer Institute": 73,
                "Swiss National Science Foundation": 71,
                "Natural Sciences and Engineering Research Council of Canada": 70,
                "National Institute of General Medical Sciences": 65,
                "Wellcome Trust": 58,
                "European Research Council": 58,
                "Canadian Institutes of Health Research": 48,
                "U.S. Department of Energy": 47,
                "American Heart Association": 45,
                "Australian Research Council": 44,
                "Japan Society for the Promotion of Science": 41,
                "Agence Nationale de la Recherche": 39,
                "Biotechnology and Biological Sciences Research Council": 36,
                "European Commission": 35,
                "Medical Research Council": 34
            }
        },
        "container-title": {
            "value-count": 20,
            "values": {
                "International Journal of Radiation Oncology*Biology*Physics": 50456,
                "Experimental Biology and Medicine": 45042,
                "Journal of Molecular Biology": 32735,
                "The Quarterly Review of Biology": 32714,
                "Advances in Experimental Medicine and Biology": 30735,
                "Q REV BIOL": 29207,
                "The Journal of Cell Biology": 24645,
                "Bulletin of Experimental Biology and Medicine": 23301,
                "Bull Exp Biol Med": 21896,
                "Developmental Biology": 20172,
                "Methods in Molecular Biology": 17948,
                "The American Biology Teacher": 16267,
                "Current Biology": 14795,
                "Free Radical Biology and Medicine": 14511,
                "Journal of Theoretical Biology": 14298,
                "Biology of Reproduction": 13131,
                "Journal of Fish Biology": 12855,
                "Physics in Medicine and Biology": 12008,
                "Phys. Med. Biol.": 12008,
                "Ultrasound in Medicine & Biology": 11619
            }
        },
        "published": {
            "value-count": 20,
            "values": {
                "1995": 20372,
                "1996": 20690,
                "1997": 23113,
                "1998": 23406,
                "1999": 24049,
                "2000": 25894,
                "2001": 28321,
                "2002": 31024,
                "2003": 29989,
                "2004": 35256,
                "2005": 36228,
                "2006": 40894,
                "2007": 41361,
                "2008": 41729,
                "2009": 47723,
                "2010": 47539,
                "2011": 49566,
                "2012": 49153,
                "2013": 52393,
                "2014": 40608
            }
        },
        "category-name": {
            "value-count": 20,
            "values": {
                "Molecular Biology": 175601,
                "Cell Biology": 173844,
                "Biochemistry, Genetics and Molecular Biology(all)": 119655,
                "Agricultural and Biological Sciences(all)": 108618,
                "Radiology Nuclear Medicine and imaging": 93977,
                "Ecology, Evolution, Behavior and Systematics": 93608,
                "Biochemistry": 78145,
                "Cancer Research": 64362,
                "Genetics": 60815,
                "Aquatic Science": 60426,
                "Oncology": 56598,
                "Radiation": 55175,
                "Radiological and Ultrasound Technology": 38642,
                "Plant Science": 37754,
                "Developmental Biology": 36953,
                "Medicine(all)": 34782,
                "Biophysics": 34523,
                "Ecology": 27747,
                "Physiology": 27668,
                "Agronomy and Crop Science": 27098
            }
        },
        "source": {
            "value-count": 1,
            "values": {
                "CrossRef": 1084353
            }
        },
        "orcid": {
            "value-count": 20,
            "values": {
                "http://orcid.org/0000-0003-4562-2738": 4,
                "http://orcid.org/0000-0003-4191-5306": 4,
                "http://orcid.org/0000-0003-0115-4725": 4,
                "http://orcid.org/0000-0003-2146-6726": 3,
                "http://orcid.org/0000-0003-1907-2744": 3,
                "http://orcid.org/0000-0003-0659-5183": 3,
                "http://orcid.org/0000-0003-0362-783X": 3,
                "http://orcid.org/0000-0002-9863-8461": 3,
                "http://orcid.org/0000-0002-9461-610X": 3,
                "http://orcid.org/0000-0002-8528-1627": 3,
                "http://orcid.org/0000-0002-7423-2934": 3,
                "http://orcid.org/0000-0002-6566-6239": 3,
                "http://orcid.org/0000-0002-1024-3220": 3,
                "http://orcid.org/0000-0001-8636-1533": 3,
                "http://orcid.org/0000-0001-8540-7907": 3,
                "http://orcid.org/0000-0001-7430-294X": 3,
                "http://orcid.org/0000-0001-7327-4481": 3,
                "http://orcid.org/0000-0001-6243-526X": 3,
                "http://orcid.org/0000-0001-5398-5569": 2,
                "http://orcid.org/0000-0001-5372-510X": 2
            }
        },
        "publisher-name": {
            "value-count": 20,
            "values": {
                "Elsevier BV": 320772,
                "Springer Science + Business Media": 187889,
                "Wiley-Blackwell": 88600,
                "SAGE Publications": 46001,
                "IEEE": 38280,
                "University of Chicago Press": 33340,
                "Nature Publishing Group": 27754,
                "Rockefeller University Press": 24679,
                "Informa UK Limited": 24332,
                "JSTOR": 19435,
                "Oxford University Press (OUP)": 17115,
                "Society for the Study of Reproduction": 13131,
                "IOP Publishing": 12772,
                "American Society for Microbiology": 11194,
                "Ovid Technologies (Wolters Kluwer Health)": 11069,
                "Elsevier": 10063,
                "American Society for Cell Biology (ASCB)": 8682,
                "Public Library of Science (PLoS)": 7876,
                "The Company of Biologists": 7857,
                "Informa Healthcare": 7845
            }
        },
        "type-name": {
            "value-count": 19,
            "values": {
                "Journal Article": 894981,
                "Chapter": 131397,
                "Conference Paper": 42264,
                "Entry": 6081,
                "Book": 4175,
                "Journal Issue": 3250,
                "Monograph": 632,
                "Other": 582,
                "Dataset": 414,
                "Report": 212,
                "Journal": 203,
                "Dissertation": 56,
                "Proceedings": 37,
                "Component": 32,
                "Book Series": 17,
                "Reference": 13,
                "Proceedings Series": 3,
                "Standard": 2,
                "Journal Volume": 2
            }
        }
    }
}

To get this discussion started: which of these are the most important? @jenniferlin15 your feedback would be great here.

jenniferlin15 commented 9 years ago

The only ones from this list that would be useful for faceted search are:

Category Name might be useful, though since this is not a discovery tool, I don;t think it's a relevant usecase.

jenniferlin15 commented 9 years ago

I assume we can do faceted search with the ALM API. is that correct and what fields are available?

mfenner commented 9 years ago

We can't do faceted search with the ALM API. You normally need Solr (or Elasticsearch) for this feature.

jenniferlin15 commented 9 years ago

Apologies, I meant PLOS Search API since all the article metadata used in the ALM Reports searching functionality pulls from the PLOS Search API (not the PLOS ALM API). All of that information (article metadata) is indexed in our Solr instance.

Please confirm or deny that we can do faceted search for PLOS ALM Reports. I'm most interested in providing this feature for our internal users of ALM Reports as this addresses a whole host their needs.

On Tue, Oct 7, 2014 at 1:34 PM, Martin Fenner notifications@github.com wrote:

We can't do faceted search with the ALM API. You normally need Solr (or Elasticsearch) for this feature.

— Reply to this email directly or view it on GitHub https://github.com/articlemetrics/alm-report/issues/48#issuecomment-58256530 .

"The blessed will not care what angle they are regarded from, Having nothing to hide. Dear, I know nothing of Either, but when I try to imagine a faultless love Or the life to come, what I hear is the murmur Of underground streams, what I see is a limestone landscape."

jure commented 9 years ago

It is also possible to do faceting for PLOS results:

{
"response": {
"numFound": 225,
"start": 0,
"docs": []
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"author_display": [],
"journal": [
"plos one",
180,
"plos pathogens",
15,
"plos biology",
9,
"plos neglected tropical diseases",
6,
"plos computational biology",
4,
"plos genetics",
1,
"plos clinical trials",
0,
"plos collections",
0,
"plos medicin",
0,
"plos medicine",
0
],
"article_type": [
"research article",
211,
"correction",
10,
"synopsis",
3,
"pearls",
1,
"best practice",
0,
"book review",
0,
"book review/science in the media",
0,
"case report",
0,
"collection review",
0,
}}

Which fields would we like to facet on for PLOS?

mfenner commented 9 years ago

The PLOS Search web interface facets on journal, subject area and article_type. We discussed before that subject_area might be too complex for this iteration. Faceting on publication year - if possible - would be helpful.

jure commented 9 years ago

This is being addressed in #121, so I'll close this as duplicate.