gotec / git2net

An Open Source Python package for the extraction of fine-grained and time-stamped co-editing networks from git repositories.
https://git2net.readthedocs.io
GNU Affero General Public License v3.0
53 stars 16 forks source link

Tutorial update #30

Closed bkmgit closed 2 years ago

bkmgit commented 2 years ago

Partially addressing https://github.com/gotec/git2net/issues/29 - some of the functionality has changed though.

SebastianZug commented 2 years ago

Good work, but I received an

ModuleNotFoundError: No module named 'pygit2'

error when I started the tutorial using the binder and colab button. Can you please check the package management?

bkmgit commented 2 years ago

Thanks for trying it out. On Binder that seemed ok since it uses the requirements.txt file to install Python dependencies, please check again. On Colab, you may need to add an initial cell with

!pip install pygit2 git2net

Please try it and let me know if the notebook should be updated accordingly.

bkmgit commented 2 years ago

To test in Binder, you would need to launch it from https://github.com/bkmgit/git2net/ using the branch tutorial-update and choosing the file TUTORIAL.ipynb The launch button uses a link assuming that the pull request has been merged to the main repository.

gotec commented 2 years ago

Sorry for the late reply, I was on holiday the past week.

Thanks a lot for the nice additions to the tutorial. Also, your efforts to allow viewing the tutorial in binder and google collab are really cool.

The only thing I am struggling with are the changes to the requirements.txt file. This has multiple reasons:

  1. A while back, I spent some efforts making git2net compatible with the new packaging requirements for PyPI. This means that all current requirements for git2net are stored in the setup.cfg file. I would prefer to only maintain one list of requirements for the tool rather than two to not cause issues in the future.
  2. To maintain future compatibility, I am trying to make git2net depend on as few other packages as possible. Therefore, the requirements only contain the packages required for the main git2net package. The tutorial uses a bunch of additional packages, e.g., for plotting (matplotlib and seaborn) and to interface with git (pygit2). As these are only required to run the tutorial but are not required to use the git2net package (i.e., any command called as git2net.<command>) I would like to keep them separated.
  3. The requirements currently provided in the pull request have some issues:
    1. pysqlite3 should be unnecessary as it is a repackaging of the sqlite3 package contained in the python standard library (https://github.com/coleifer/pysqlite3)
    2. pathpy should be pathpy2. While pathpy also works (currently) it will be redirected to a newly developed version of pathpy soon which is incompatible with the current version of git2net (although we are working on it).
    3. While I understand why it could be required for binder/google collab, I find it rather unintuitive to have git2net as a requirement, i.e., make the package depend on itself.

Is there any other way to make binder or google collab install the packages for the tutorial? I have not looked into this any closer so far.

bkmgit commented 2 years ago

For Google Colab, one can add pip install commands in the notebook, or clone the repository directly and then install it locally, rather than using pip. These commands can be commented out as you have done for another section of the notebook.

Binder does not seem to support setup.cfg at the moment as indicated here. Another option might be to create a repository git2netexamples.

bkmgit commented 2 years ago

Binder can also use a branch other than main for setup, thus the requirements.txt file can be moved to another branch, for example called binder:

gotec commented 2 years ago

Thanks for looking into this. I had also seen the page on binder configuration files but since it only lists setup.py and not the setup.cfg file which according to the python packaging instructions "should be used only as an escape hatch when absolutely necessary", I thought there must be another way.

I think using a different branch would work for now, but that would mean that we would need to keep that separate branch current as well. I think in that case, I would prefer separating the tutorial into its own repository.

As an alternative option, we could also just not have binder as a way to view the tutorial. Is there any good argument for supporting both google collab and binder? Does binder offer any type of additional functionality? So far, I view them as alternative ways to execute the tutorial without having to download it.

bkmgit commented 2 years ago

Google Colab may not be available in every location, but can start with it. If you are expecting more tutorials, then creating a GitHub organization with the library as one repository, and a second repository for tutorials is worthwhile. If it will primarily be the notebook, then creating a separate branch for Binder is possible and not very high maintenance - can do that in a separate pull request since it will require some initial testing.

gotec commented 2 years ago

Great, then I suggest we start with Google Collab and I create an additional repository where we move the tutorials in the coming weeks. This is also nice as I could add examples from some additional publications.

gotec commented 2 years ago

Would you mind modifying the pull request so we no longer need the requirements.txt file? Many thanks for that!

bkmgit commented 2 years ago

Will do.

bkmgit commented 2 years ago

@gotec Hope ok now.

gotec commented 2 years ago

Looking great! Thanks a lot for the excellent contribution!