tkchafin / autostreamtree

Stream trees for SNP datasets
GNU General Public License v3.0
1 stars 0 forks source link

[JOSS REVIEW] Suggestions for Documentation #5

Closed xin-huang closed 8 months ago

xin-huang commented 9 months ago

This is a part of the JOSS review (https://github.com/openjournals/joss-reviews/issues/6160)

  1. A statement of need

The opening section of the README seems inadequate for a comprehensive statement of need.

https://github.com/tkchafin/autostreamtree/blob/e86d8af440cf1c614b64f717cf9078d49f06fba5/README.md?plain=1#L3-L7

I recommend that the authors enhance it by incorporating the statement of need from their paper. An introduction section specifically tailored for the statement of need in the README would be beneficial.

  1. Installation instructions

There appears to be a redundancy in the installation instructions. The content found here:

https://github.com/tkchafin/autostreamtree/blob/e86d8af440cf1c614b64f717cf9078d49f06fba5/README.md?plain=1#L53-L60

is repeated later:

https://github.com/tkchafin/autostreamtree/blob/e86d8af440cf1c614b64f717cf9078d49f06fba5/README.md?plain=1#L78-L85

I suggest removing the first instance to avoid duplication.

Regarding ARM Mac installations, I am unable to test them personally. However, I noticed the commands for mamba installation:

https://github.com/tkchafin/autostreamtree/blob/e86d8af440cf1c614b64f717cf9078d49f06fba5/README.md?plain=1#L125-L128

According to the latest guidelines, installing mamba via conda is not recommended and should only be installed in the base environment.

Additionally, mamba now supports ARM64, so updating the installation instructions for mamba would be appropriate.

  1. Example usage

When attempting the example analysis with this command:

https://github.com/tkchafin/autostreamtree/blob/e86d8af440cf1c614b64f717cf9078d49f06fba5/README.md?plain=1#L380

I encountered the following error:

Traceback (most recent call last):
  File "/home/xinhuang/software/mambaforge/envs/autostreamtree-dev/bin/autostreamtree", line 8, in <module>
    sys.exit(main())
  File "/home/xinhuang/software/mambaforge/envs/autostreamtree-dev/lib/python3.10/site-packages/autostreamtree/cli.py", line 28, in main
    np.random.seed(clock_seed)
  File "numpy/random/mtrand.pyx", line 4806, in numpy.random.mtrand.seed
  File "numpy/random/mtrand.pyx", line 250, in numpy.random.mtrand.RandomState.seed
  File "_mt19937.pyx", line 168, in numpy.random._mt19937.MT19937._legacy_seeding
  File "_mt19937.pyx", line 182, in numpy.random._mt19937.MT19937._legacy_seeding
ValueError: Seed must be between 0 and 2**32 - 1

This issue may need to be addressed for successful example replication.

  1. Functionality documentation

I was unable to locate any API documentation. It's important for the authors to add this as it is a requirement of the journal.

  1. Automated tests

Running pytest resulted in several warnings

tests/test_cli.py::test_autostreamtree_outputs
tests/test_cli.py::test_autostreamtree_output_shp
tests/test_cli.py::test_autostreamtree_outputs_gendist
tests/test_cli.py::test_autostreamtree_outputs_sdist
tests/test_functions.py::test_read_network
  /home/xinhuang/software/mambaforge/envs/autostreamtree-dev/lib/python3.10/site-packages/momepy/utils.py:252: UserWarning: Geometry is in a geographic CRS. Results from 'length' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

    gdf_network[length] = gdf_network.geometry.length

tests/test_cli.py::test_autostreamtree_outputs
tests/test_cli.py::test_autostreamtree_outputs
tests/test_cli.py::test_autostreamtree_output_shp
tests/test_cli.py::test_autostreamtree_output_shp
  /home/xinhuang/software/mambaforge/envs/autostreamtree-dev/lib/python3.10/site-packages/mantel/_test.py:217: DeprecationWarning: `np.math` is a deprecated alias for the standard library `math` module (Deprecated Numpy 1.25). Replace usages of `np.math` with `math`
    n = np.math.factorial(m)  # number of possible matrix permutations

tests/test_cli.py::test_autostreamtree_output_shp
  /home/xinhuang/software/mambaforge/envs/autostreamtree-dev/lib/python3.10/site-packages/pyogrio/raw.py:530: RuntimeWarning: Creating a 256th field, but some DBF readers might only support 255 fields
    ogr_write(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

Addressing these warnings could enhance the software's reliability.

Additionally, I recommend integrating code coverage reporting with a service like codecov. This would provide a clearer understanding of the extent to which the unit tests cover the codebase.

xin-huang commented 9 months ago

Furthermore, considering that Apple Silicon (https://github.com/actions/runner-images/issues/8439) is available in GitHub Actions now, it would be beneficial for the authors to include builds for OSX arm64 in the ci.yml. This addition would aid in verifying the effectiveness of the installation instructions for OSX arm64.

tkchafin commented 9 months ago

Thanks @xin-huang for your input, I'm traveling for a conference this week but I'll get on with addressing all of these next week!

radeva commented 9 months ago

it would be beneficial for the authors to include builds for OSX arm64 in the ci.yml.

@tkchafin Feel free to use FlyCI's M1 and M2 runners. Our runners are on average 2x faster and 2x cheaper than GitHub's AND we have a free tier for OSS projects (see below).

Install Instructions

  1. Install FlyCI app and
  2. Use a FlyCI runner in your workflow:
jobs:
 ci:
-    runs-on: macos-latest
+    runs-on: flyci-macos-large-latest-m1
   steps:
   - name: 👀 Checkout repo
     uses: actions/checkout@v4

500 mins/month Free for Public Repos

Since your repo is public, FlyCI offers 500 mins/month of free M1 runner usage with the flyci-macos-large-latest-m1 runner for public projects.

Don't hеsitate to contact us in case the free tier doesn't suit your needs or you experience any issues with the runners. Our team is here to support you!

Best Regards, Veselina Radeva Product Manager at FlyCI

tkchafin commented 9 months ago

Thanks @xin-huang, I've fixed all of these now. I sent a pull request with a fix for the warning coming from pymantel which has now been merged, so that one will clear once the conda installation has been updated to the newest release. The other warnings can be ignored (in the context of the unit tests!) so I have suppressed them.

xin-huang commented 9 months ago

Hello @tkchafin

Thank you for the update.

With the current version in this repository, I can proceed with the example analysis. However, when I used the command listed in the README to install autostreamtree from the ecoevoinfo channel:

mamba install -c ecoevoinfo -c conda-forge -c bioconda autostreamtree

I still got the same error: ValueError: Seed must be between 0 and 2**32 - 1.

To resolve this, I had to explicitly specify the version with the following command:

mamba install -c ecoevoinfo -c conda-forge -c bioconda autostreamtree==v1.1.3

This suggests that the conda channel might need an update to automatically provide the latest version to users.

Besides, I observed that the version on the conda channel is v1.1.3, whereas the setup.py in this repository lists version 1.0.1.

https://github.com/tkchafin/autostreamtree/blob/3faf71c0340a8139cef79be4a2a6a7e1827bb074/setup.py#L5

It might be beneficial to synchronize the version in setup.py with the conda channel to avoid confusion.

tkchafin commented 8 months ago

Versioning issues should be fixed now

xin-huang commented 8 months ago

yes, it works now, thanks