ropensci / tokenizers

Fast, Consistent Tokenization of Natural Language Text
https://docs.ropensci.org/tokenizers
Other
184 stars 25 forks source link

Submit paper to JOSS #39

Closed lmullen closed 6 years ago

lmullen commented 7 years ago

@Ironholds @dselivanov @jrnold I intend to submit this package as a paper to the Journal of Open Source Software. The package has already been peer reviewed by rOpenSci, so I anticipate that this will be a formality. But it will make our work more citable. When I've put together the submission materials I'll run them by you. But I want to get everything into shape for the v0.2 release.

lmullen commented 7 years ago

@jrnold @kbenoit Any suggestions for citations for this paper? I'm looking through the literature and have a few that I might add. But perhaps you know of a few things that we really ought to cite.

jrnold commented 7 years ago

What sort of citations are you looking for? The Denny & Spirling paper on the importance of preprocessing maybe of use: http://www.nyu.edu/projects/spirling/documents/preprocessing.pdf

lmullen commented 7 years ago

@jrnold Thanks for the suggestion. That's the kind of thing I was thinking of. While none of what we are doing here is controversial, it would be good to have a few citations.

lmullen commented 6 years ago

@jrnold @kbenoit @dselivanov @Ironholds:

I intend to submit a paper for this package to the Journal of Open Source Software as soon as I release version 0.2 to CRAN. The work for that release is basically done, except making sure that the package works well with the text interchange format.

I'd like to invite all the contributors to the package to be co-authors. If you would like to be a co-author, please let me know and tell me your the way you want your name, affiliation, and (optionally) ORCID to appear. If I don't hear from you, I'll assume you don't want to be on the paper.

If you'd like to comment or revise the paper, there is a complete first draft of paper.md in the joss-paper branch.

kbenoit commented 6 years ago

Happy to be listed and honored to be included. Kenneth Benoit, London School of Economics and Political Science, ORCID ID is 0000-0002-0797-564X. I will look at the draft asap.

Ironholds commented 6 years ago

Absolutely! Os Keyes, Department of Human Centered Design & Engineering, University of Washington, 0000-0001-5196-609X. Happy to chip in to the paper just as soon as I finish an upcoming CSCW submission.

lmullen commented 6 years ago

@kbenoit @Ironholds Glad to have you on board. I've added you both the paper on the joss-paper branch, and updated the same information in the DESCRIPTION file on master. Both commits are above.

dselivanov commented 6 years ago

@lmullen thanks for the effort. Please add me as well: Dmitry Selivanov, Open Data Science (http://ods.ai/)

lmullen commented 6 years ago

Great, @dselivanov. I've added you to the author list.

jrnold commented 6 years ago

I would be great to have be added. Thanks! Jeffrey Arnold, Political Science, University of Washington. 0000-0001-9953-3904.

Do you need me to look at anything in particular for the paper?

lmullen commented 6 years ago

Version 0.2 is on CRAN now. Thanks for all of you for contributing to it. I will publicize once the binaries have been built on CRAN.

I have revised the paper again. Thanks, @kbenoit, for going over it already. I don't think there is a lot more to be done for it, but I am open to any suggestions. In particular, if anyone has used or knows of places where this package has been used in citable research, I'd be glad to include a few more examples since JOSS likes that.

Could I ask for you all to contribute whatever you want by March 28? I will send the paper to JOSS on that date unless anyone requests more time.

The package has already been peer reviewed by rOpenSci, so I expect that other than a look over the paper we should get through JOSS reasonably quickly.

lmullen commented 6 years ago

@jrnold If you have a citation for PennTreebank that might be a good addition. The website for the original implementation is now offline. I couldn't find a paper to cite, but if you know of one we should add it.

jrnold commented 6 years ago

This seems to be a link for the original PTB tokenizer: ftp://ftp.cis.upenn.edu/pub/treebank/public_html/tokenization.html, and this looks to be the link to the original Robert McIntyre sed script ftp://ftp.cis.upenn.edu/pub/treebank/public_html/tokenizer.sed

lmullen commented 6 years ago

The paper has been accepted by JOSS. Here's the DOI: https://doi.org/10.21105/joss.00655

I've submitted a few minor typographical changes to clean up the article, but other than that it's all done.