wiheto / readabilityinscience

MIT License
12 stars 7 forks source link

Make readabilityinscience an installable python module via requirements and setup.py #1

Closed russelljjarvis closed 3 years ago

russelljjarvis commented 3 years ago

Make readabilityinscience an installable python module via requirements and setup.py

wiheto commented 3 years ago

I understand why you made this, and I guess it is no harm adding this despite this repo being more of a package of R and python code for the analysis.

wiheto commented 3 years ago

My main concern though is if all the requirements are needed? It's been a while since I worked with this code. But I do not believe "cryptography" is every explicitly called. Perhaps review the requirements.txt additions, and make sure it is only the absolutely necessary packages?

russelljjarvis commented 3 years ago

Hi @wiheto. Sure I can make sure this a truly minimal requirements/install.

wiheto commented 3 years ago

So it looks like you are not submitting updates to the functions themselves but also adding these files again in an entire new folder (readability). And additional things like a circleci config file. Was this the intention?

russelljjarvis commented 3 years ago

Yes perhaps you are justly concerned about bloat, let me explain my reasoning.

A continuous integration badge like the one on line 10 of README.md (Green success badge/ci link) signals to other developers that the package can build in a machine indipent way (requirements.txt is sufficient to install, and build works it doesn't break). Without a CI badge people can't tell how much time they need to invest to make a package build-able. Thus the .circleci/config.yml file.

setup.py and requirements.txt files need to be nested one level outside the main module import directory in order to make your repository both pip installable and importable, when running from outside the code directory.

See for example: neuronunit/neuronunit. This is a very standard python repository pattern.

The inner repository neuronunit must have an __init__.py file that specifies which submodule files will be importable.

ATM I am looking for work and finishing several projects, so fixing this PR by making the repository a minimal installable work flow could take a month or so.

Perhaps you are wondering why only the readabilityFunctions are moved inside the readability sub directory? That is just because its a big job to check if everything works, and I only know how to use the readability functions from your code.

russelljjarvis commented 3 years ago

If you look at the README now, you can see that circle-ci passes building and can import the module based only on the requirements.txt provided. See the green status badge in the README: https://github.com/russelljjarvis/readabilityinscience

Also I replaced tree tagger, provided user install instructions, replaced old requirements for pandas with the latest version. It is a pretty big job, this is a only start, but that's all I have time for for now.

wiheto commented 3 years ago

Let us take a step back here. I only asked for you to remove some of the requirements that you had added in your original PR that are not used in the code. I did not see a problem adding your original PR as it might make some people's lives easier. But this is starting to become something quite different. And, I do not think should be added to this repo (sorry to say this).

My decision is based on what this repo is for. It is to help people replicate our analyses of our article. It contains all the code used, the data, the statistical analyses, and figure generation etc..

So this repo in particular is not intended to be a general repo for readbaility analyses. A general python package would not contain all the R code we have in this repo, for example. I am all for all these changes being part of another package that is just for the python readability functions, but I do not think it should be maintained here.

Thus when you say

Also I replaced tree tagger, provided user install instructions, replaced old requirements for pandas with the latest version. It is a pretty big job, this is a only start, but that's all I have time for for now.

That is a great move to do for making a general readability package, but it is not good to replicate our original results which used treetagger and additional variations may exist between our results in the paper and what this repo will provide because of differences in software. That is not something we should do in this repo.

What it looks like you want to do is create a general package for the python part of the code while also making improvements to the code. That is a bigger job, I agree with you. And I think doing that is a great idea. But I do not think it should be within this repo, as it exists to replicate our previous work.

I hope this is clear. I think the work you've started on here sounds great, but should be in a dedicated repo that aims to perform those types of analyses and this repo is solely for the purpose of replicating the elife paper.

And if you create such a package, we can gladly link to it in the README here saying that, for general functions, they should look there.

wiheto commented 3 years ago

So one of two decisions seems to be better (once I've read the other issues as well).

  1. Either a new repo is made that only contains the python parts of the code is used where we can add all your changes and maybe slowly improve the code (I'm unsure how much time I can dedicate to this). I do not mind creating this and then you submit these changes there.
  2. You use the code from the package like you have done already, and giving credit to the origin of the code is sufficient.

Both these help you with your purpose while maintaining this repo as a snapshot. I hope that sounds ok with you. Let me know which you would prefer.

russelljjarvis commented 3 years ago

Yes, totally fine not to merge in the PR. Your reasoning makes sense, I will reply in more detail later.

russelljjarvis commented 3 years ago

2. You use the code from the package like you have done already, and giving credit to the origin of the code is sufficient. Seems fine to me. I only started one 1. because I was not sure if you would also be fine with 2.

Also I am grateful for the opportunity to exchange ideas with you.

One thing that could better aid with the reproducible of original code is uploading a docker container of the original software environment. Its true that people will always be able to read the original code, but the window for being able to run the original code either will close soon, or is closing.