ropensci-org / ropensci_citations

rOpenSci Citation Data
MIT License
7 stars 3 forks source link

Dataset request :-) #2

Closed maelle closed 4 years ago

maelle commented 4 years ago

Thanks @sckott for https://raw.githubusercontent.com/ropenscilabs/ropensci_citations/master/citations_all_parts.json

I'd need a slightly different dataset to make things easier for the website

Would it be possible for you to generate this dataset?

Thanks in advance! :pray:

maelle commented 4 years ago

It might even make sense for that JSON to only contain journal articles or at least not e.g. newspaper articles.

maelle commented 4 years ago

Further wish, in that JSON I'd use for the website, could you exclude the citations of the gender package?

sckott commented 4 years ago

most likely can do all that, will let you know

sckott commented 4 years ago
sckott commented 4 years ago

@maelle can you take a look at https://github.com/ropenscilabs/ropensci_citations/blob/master/citations_all_parts_clean.json when you get a chance, and see if that suits your needs.

maelle commented 4 years ago

Thanks. The one below looks weird

image

maelle commented 4 years ago

Also, could date be a single string, or could there be a field called year that's either a year or something like "in press"? See example below where there are two dates.


   "parts": {
      "author": [
        {
          "family": "Kang",
          "given": "W."
        },
        {
          "family": "Zhang",
          "given": "M."
        },
        {
          "family": "Wang",
          "given": "Q."
        },
        {
          "family": "Gu",
          "given": "D."
        },
        {
          "family": "Huang",
          "given": "Z."
        },
        {
          "family": "Wang",
          "given": "H."
        },
        {
          "family": "Jin",
          "given": "X."
        },
        {
          "others": true
        }
      ],
      "date": [
        "2020",
        "2020"
      ],
      "title": "The SLC Family Are Candidate Diagnostic and Prognostic Biomarkers in Clear Cell Renal Cell Carcinoma",
      "pages": "1–17",
      "url": "https://doi.org/10.1155/2020/1932948",
      "type": "article-journal",
      "container-title": "BioMed Research International",
      "doi": "10.1155/2020/1932948"
    },
    "url": "https://doi.org/10.1155/2020/1932948"
  },
  {
``
maelle commented 4 years ago

Actually for packages I'd like a vector, e.g. "name":["spocc", "taxize"], sorry for changing my mind.

maelle commented 4 years ago

so requests

  {
    "name": "rotl",
    "doi": "10.1101/2020.01.14.905901",
    "citation": "Walczyńska, A., Gudowska, A., & Sobczyk, Ł. (2020). Should I shrink or should I flow? – body size adjustment to thermo-oxygenic niche. <https://doi.org/10.1101/2020.01.14.905901>",
    "parts": {
      "author": [
        {
          "family": "Walczyńska",
          "given": "A."
        },
        {
          "family": "Gudowska",
          "given": "A."
        },
        {
          "family": "Sobczyk",
          "given": "Ł."
        }
      ],
      "date": "2020",
      "title": "Should I shrink or should I flow? – body size adjustment to thermo-oxygenic niche",
      "url": "https://doi.org/10.1101/2020.01.14.905901",
      "doi": "10.1101/2020.01.14.905901"
    },
    "research_snippet": "body size adjustment in rotifers",
    "url": "https://doi.org/10.1101/2020.01.14.905901"
  }
  {
    "name": "plotly",
    "citation": "Glanz, H., & Pileggi, S. 2018. Improving statistical communication in statistical computing courses. In M. A. Sorto, A. White, & L. Guyot (Eds.), Looking back, looking forward. Proceedings of the Tenth International Conference on Teaching Statistics (ICOTS10, July, 2018), Kyoto, Japan. Voorburg, The Netherlands: International Statistical Institute. <https://iase-web.org/icots/10/proceedings/pdfs/ICOTS10_3F1.pdf>",
    "parts": {
      "author": [
        {
          "family": "Glanz",
          "given": "H."
        },
        {
          "family": "Pileggi",
          "given": "S."
        }
      ],
      "date": [
        "2018",
        "2018-07"
      ],
      "title": "Improving statistical communication in statistical computing courses",
      "url": "https://iase-web.org/icots/10/proceedings/pdfs/ICOTS10_3F1.pdf",
      "type": "paper-conference",
      "container-title": "Looking back, looking forward. Proceedings of the Tenth International Conference on Teaching Statistics (ICOTS10",
      "location": "Kyoto, Japan. Voorburg, The Netherlands",
      "publisher": "International Statistical Institute",
      "editor": [
        {
          "family": "Sorto",
          "given": "M.A."
        },
        {
          "family": "White",
          "given": "A."
        },
        {
          "family": "Guyot",
          "given": "L."
        }
      ]
    },
    "url": "https://iase-web.org/icots/10/proceedings/pdfs/ICOTS10_3F1.pdf"
  },
 {
    "name": "UCSCXenaTools",
    "doi": "10.1155/2020/1932948",
    "citation": "Kang, W., Zhang, M., Wang, Q., Gu, D., Huang, Z., Wang, H., … Jin, X. (2020). The SLC Family Are Candidate Diagnostic and Prognostic Biomarkers in Clear Cell Renal Cell Carcinoma. BioMed Research International, 2020, 1–17. <https://doi.org/10.1155/2020/1932948>",
    "parts": {
      "author": [
        {
          "family": "Kang",
          "given": "W."
        },
        {
          "family": "Zhang",
          "given": "M."
        },
        {
          "family": "Wang",
          "given": "Q."
        },
        {
          "family": "Gu",
          "given": "D."
        },
        {
          "family": "Huang",
          "given": "Z."
        },
        {
          "family": "Wang",
          "given": "H."
        },
        {
          "family": "Jin",
          "given": "X."
        },
        {
          "others": true
        }
      ],
sckott commented 4 years ago

When theres others I'd do et al. in italics. When there's more than 2 authors at all could go to et al.

maelle commented 4 years ago

Author names are shown in the random subset of cards and in the table: https://roweb3-hugo.netlify.app/citations/ Good point about et al.

sckott commented 4 years ago

@maelle made fixes

see updated citations_all_parts_clean.json flie

maelle commented 4 years ago

Thank you!!

maelle commented 4 years ago

I don't see the year field in https://ropenscilabs.github.io/ropensci_citations/citations_all_parts.json actually?

maelle commented 4 years ago

because that is wrong file, well done me...

maelle commented 4 years ago

How is data parsed btw? An internal to document :wink:

So in summary

 {
    "name": "rotl",
    "doi": "10.1111/nph.16361",
    "citation": "Godfrey, J. M., Riggio, J., Orozco, J., Guzmán‐Delgado, P., Chin, A. R. O., & Zwieniecki, M. A. (2020). Ray fractions and carbohydrate dynamics of tree species along a 2750 m elevation gradient indicate climate response, not spatial storage limitation. New Phytologist, 225(6), 2314–2330. <https://doi.org/10.1111/nph.16361>",
    "parts": {
      "author": [
        {
          "literal": "Godfrey, J. M., Riggio, J., Orozco, J., Guzmán‐Delgado, P., Chin, A. R. O., & Zwieniecki, M. A."
        }
      ],
      "date": "2020",
      "title": "Ray fractions and carbohydrate dynamics of tree species along a 2750 m elevation gradient indicate climate response, not spatial storage limitation",
      "volume": "225",
      "pages": "2314–2330",
      "url": "https://doi.org/10.1111/nph.16361",
      "type": "article-journal",
      "container-title": "New Phytologist",
      "issue": "6",

Thanks again!

sckott commented 4 years ago

How is data parsed btw?

what do you mean?

I'll filter out those with no dates.

For authors, do you want the string just as you put above as a hash inside an array, or instead as a string to author key?

"author": "Godfrey, J. M., Riggio, J., Orozco, J., Guzmán‐Delgado, P., Chin, A. R. O., & Zwieniecki, M. A."

Okay, packages always as an array

maelle commented 4 years ago

I mean technically, what do you use to parse the citations data. :+1:

I don't understand the questions regarding authors.

maelle commented 4 years ago

Shouldn't the citation below be excluded? (it's from an online newspaper, not a scientific journal)


{
    "name": "ropenaq",
    "citation": "Munkhbat, Oyungerel. 2017. Putting a magnifying glass on air pollution. The UB Post. <http://theubpost.mn/2017/01/12/putting-a-magnifying-glass-on-air-pollution>",
    "parts": {
      "author": [
        {
          "family": "Munkhbat",
          "given": "Oyungerel"
        }
      ],
      "date": "2017",
      "title": "Putting a magnifying glass on air pollution",
      "url": "http://theubpost.mn/2017/01/12/putting-a-magnifying-glass-on-air-pollution",
      "type": "article-journal",
      "container-title": "The UB Post"
    },
    "url": "http://theubpost.mn/2017/01/12/putting-a-magnifying-glass-on-air-pollution",
    "year": "2017"
  }
``
maelle commented 4 years ago

:wave: @sckott

maelle commented 4 years ago

note that the current page looks ok https://roweb3-hugo.netlify.app/citations/ minus the exceptions that further improvements of the dataset will fix. So I don't consider this a show stopper for website launch.

sckott commented 4 years ago

the authors question- do you want this

"author": [
    {
        "literal": "Godfrey, J. M., Riggio, J., Orozco, J., Guzmán‐Delgado, P., Chin, A. R. O., & Zwieniecki, M. A."
    }
]

or this

"author": "Godfrey, J. M., Riggio, J., Orozco, J., Guzmán‐Delgado, P., Chin, A. R. O., & Zwieniecki, M. A."
sckott commented 4 years ago

that one citation is now excluded

filtered out citations where year=NA

maelle commented 4 years ago

Reg authors the second format seems easier to deal with. Could we add a rule that above ? authors the other authors are "et al" so that it might look good on https://roweb3-hugo.netlify.app/citations/?

sckott commented 4 years ago

@maelle okay, updated. author now a string, and using et al. for more than 2 authors, just using last names only

maelle commented 4 years ago

Awesome. One last thing, "name" is still not an array in all cases? See e.g. https://github.com/ropenscilabs/ropensci_citations/blob/b974a97aacf208879ccefd78e5f0d647b8209878/citations_all_parts_clean.json#L11944

I tried to make things work with name sometimes a string, sometimes an array, but it'll be easier if it's always an array, sorry.

sckott commented 4 years ago

ok

sckott commented 4 years ago

name always an array now

maelle commented 4 years ago

Thank you!! And thanks for your patience creating this dataset and dealing with my requests!!