openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: Biotrade: A Python package to access and analyse the international trade of bio-based products #5550

Closed editorialbot closed 1 year ago

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@paulrougieux<!--end-author-handle-- (Paul Rougieux) Repository: https://gitlab.com/bioeconomy/forobs/biotrade/ Branch with paper.md (empty if default branch): Version: v0.3.0 Editor: !--editor-->@martinfleis<!--end-editor-- Reviewers: @FATelarico, @potterzot Archive: 10.5281/zenodo.8406069

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/f0f37ab918abe2cb766e94b42e520c67"><img src="https://joss.theoj.org/papers/f0f37ab918abe2cb766e94b42e520c67/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/f0f37ab918abe2cb766e94b42e520c67/status.svg)](https://joss.theoj.org/papers/f0f37ab918abe2cb766e94b42e520c67)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@FATelarico & @potterzot, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @martinfleis know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

πŸ“ Checklist for @FATelarico

πŸ“ Checklist for @potterzot

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 year ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.12 s (706.8 files/s, 108995.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          64           1357           3631           5554
Markdown                        15            414              0           1646
TeX                              1              9              0             79
YAML                             2              7              1             56
SQL                              1              3              3             39
-------------------------------------------------------------------------------
SUM:                            83           1790           3635           7374
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 1 year ago

Wordcount for paper.md is 2158

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.32614/RJ-2017-019 is OK

MISSING DOIs

- 10.1126/science.abm9267 may be a valid DOI for title: Disentangling the numbers behind agriculture-driven tropical deforestation

INVALID DOIs

- None
editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

martinfleis commented 1 year ago

πŸ‘‹πŸΌ @paulrougieux, @FATelarico, @potterzot this is the review thread for the paper. All of our communications will happen here from now on.

All reviewers should create checklists with the JOSS requirements using the command @editorialbot generate my checklist. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues (and small pull requests if needed) on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/5550 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks, feel free to start whenever it works for you. Since we're approaching summer period and one of you already pointed out that they may need more time, I will wait with reminders until 6 weeks from now, so no need to rush. Please let me know if any of you require significantly more time at any point. We can also use editorialbot to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@martinfleis) if you have any questions/concerns.

Thanks!

paulrougieux commented 1 year ago

Missing doi added in https://gitlab.com/bioeconomy/forobs/biotrade/-/commit/b7392da1cebdeafedfd3d716c1bd9b72b9aca4b5

FATelarico commented 1 year ago

Review checklist for @FATelarico

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

FATelarico commented 1 year ago

Authorship

Suggestion

I would have one question regarding authorship. Looking at the list of commits on GitLab, the third Author’s name appears only a handful of times (six, to be precise) in a limited timespan (Jan. 27 β€” July 29, 2022).

Surely, it is up to the authors themselves to define whether some edits are important enough to qualify someone as author. But perhaps it could be useful to highlight this imbalance between the listed individuals’ contribution.

FATelarico commented 1 year ago

Installation

Suggestion

It could be worth mentioning that buildingpsycopg2 the latter from source requires to edit the $PATH variable. This may be easy for most Unix users, but not necessarily so for everyone and for most Windows users.

pip install biotrade
Defaulting to user installation because normal site-packages is not writeable
Collecting biotrade
  Downloading biotrade-0.1.3-py3-none-any.whl (168 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.6/168.6 KB 1.1 MB/s eta 0:00:00
Requirement already satisfied: pandas in ./.local/lib/python3.10/site-packages (from biotrade) (2.0.1)
Collecting sqlalchemy
  Downloading SQLAlchemy-2.0.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.7/2.7 MB 3.0 MB/s eta 0:00:00
[setup.txt](https://github.com/openjournals/joss-reviews/files/11778683/setup.txt)

Collecting psycopg2
  Downloading psycopg2-2.9.6.tar.gz (383 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 384.0/384.0 KB 2.4 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  Γ— python setup.py egg_info did not run successfully.
  β”‚ exit code: 1
  ╰─> [23 lines of output]
      running egg_info
      creating /tmp/pip-pip-egg-info-_0o9nlur/psycopg2.egg-info
      writing /tmp/pip-pip-egg-info-_0o9nlur/psycopg2.egg-info/PKG-INFO
      writing dependency_links to /tmp/pip-pip-egg-info-_0o9nlur/psycopg2.egg-info/dependency_links.txt
      writing top-level names to /tmp/pip-pip-egg-info-_0o9nlur/psycopg2.egg-info/top_level.txt
      writing manifest file '/tmp/pip-pip-egg-info-_0o9nlur/psycopg2.egg-info/SOURCES.txt'
      
      Error: pg_config executable not found.
      
      pg_config is required to build psycopg2 from source.  Please add the directory
      containing pg_config to the $PATH or specify the full executable path with the
      option:
      
          python setup.py build_ext --pg-config /path/to/pg_config build ...
      
      or with the pg_config option in 'setup.cfg'.
      
      If you prefer to avoid building psycopg2 from source, please install the PyPI
      'psycopg2-binary' package instead.
      
      For further information please check the 'doc/src/install.rst' file (also at
      <https://www.psycopg.org/docs/install.html>).
      
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Γ— Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

So, psycopg2-binary could be a better requirement than psycopg2 (I didn't open a pull request on GitLab because I'd prefer not to use a phone number to sign up, see attached setup.txt file). In this way:

pip install "biotrade.tar.xz"
Defaulting to user installation because normal site-packages is not writeable
Processing ./biotrade.tar.xz
  Preparing metadata (setup.py) ... done
Requirement already satisfied: pandas in ./.local/lib/python3.10/site-packages (from biotrade==0.1.3) (2.0.1)
Requirement already satisfied: psycopg2-binary in ./.local/lib/python3.10/site-packages (from biotrade==0.1.3) (2.9.6)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from biotrade==0.1.3) (2.28.2)
Collecting sqlalchemy
  Using cached SQLAlchemy-2.0.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB)
Collecting sqlalchemy_utils
  Downloading SQLAlchemy_Utils-0.41.1-py3-none-any.whl (92 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92.6/92.6 KB 1.5 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.21.0 in /usr/lib/python3/dist-packages (from pandas->biotrade==0.1.3) (1.21.5)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->biotrade==0.1.3) (2.8.2)
Requirement already satisfied: tzdata>=2022.1 in ./.local/lib/python3.10/site-packages (from pandas->biotrade==0.1.3) (2023.3)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas->biotrade==0.1.3) (2022.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->biotrade==0.1.3) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests->biotrade==0.1.3) (2020.6.20)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3/dist-packages (from requests->biotrade==0.1.3) (1.26.5)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->biotrade==0.1.3) (3.0.1)
Collecting greenlet!=0.4.17
  Downloading greenlet-2.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (613 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 613.7/613.7 KB 3.8 MB/s eta 0:00:00
Collecting typing-extensions>=4.2.0
  Downloading typing_extensions-4.6.3-py3-none-any.whl (31 kB)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas->biotrade==0.1.3) (1.16.0)
Building wheels for collected packages: biotrade
  Building wheel for biotrade (setup.py) ... done
  Created wheel for biotrade: filename=biotrade-0.1.3-py3-none-any.whl size=173260 sha256=64fd083e855722adda72f411c7d8a453948c332c9e263db35877fc260652b12d
  Stored in directory: /tmp/pip-ephem-wheel-cache-1kvnp1nb/wheels/58/a4/e2/bfeca1c223759f992ed5d0f59387e5970a10c9da94d69858b1
Successfully built biotrade
Installing collected packages: typing-extensions, greenlet, sqlalchemy, sqlalchemy_utils, biotrade
Successfully installed biotrade-0.1.3 greenlet-2.0.2 sqlalchemy-2.0.16 sqlalchemy_utils-0.41.1 typing-extensions-4.6.3
FATelarico commented 1 year ago

Functionality

Possible issue

The data pumps return the error AttributeError: 'Engine' object has no attribute 'exec_driver_sql' when run on a machine with SQLAlchemy>=2.0.

image

There were some non-backward compatible changes after v. 1.4, was the migration to 2.0 dealt with beforehand? Otherwise it may be a problem on my side, but I am asking to be sure.

FATelarico commented 1 year ago

Example usage

Suggestion

Real world examples are included and instructive. However, I wonder about the rationale for including an example based on R’s reticulate package. Essentially everything that runs under python can be executed through reticulate, so an entire example may be superfluous. Perhaps just mentioning it?

FATelarico commented 1 year ago

Community guidelines

Suggestions

An issue tracker is available on GitLab, but a template could be provided for additional user-friendliness

FATelarico commented 1 year ago

Summary

Suggestion

I would suggest spending a few words on the exact choices made in creating the mapping tables between different sources and how the difference in granularity between them was handled. Especially, it could be worth to discuss design choices that may be arguable/of which the user ought to be aware when using the package.

For instance, let’s say that the user converts FAOSTAT data in Comtrade format and then the reverse. The final FAOSTAT-format table would matching perfectly the starting one? If no, which categories cause the final differences? If yes, does it apply also to the inverse operation (Comtrade-FAOSTAT-Comtrade)?

FATelarico commented 1 year ago

References

Potential issue

The paper's proof does not contain a final bibliography. However, both in-text references and a bibliography file are present. Perhaps it’s a minor issue in the rendering/knitting of the markdown file.

paulrougieux commented 1 year ago

@FATelarico

But perhaps it could be useful to highlight this imbalance between the listed individuals’ contribution.

paulrougieux commented 1 year ago

The paper's proof does not contain a final bibliography. However, both in-text references and a bibliography file are present. Perhaps it’s a minor issue in the rendering/knitting of the markdown file.

The second link view article proof on github does have a bibliography at the end. I don't know how to fix this since the automated links generated by the editorial bot are inconsistent on this point.

paulrougieux commented 1 year ago

For instance, let’s say that the user converts FAOSTAT data in Comtrade format and then the reverse. The final FAOSTAT-format table would matching perfectly the starting one? If no, which categories cause the final differences? If yes, does it apply also to the inverse operation (Comtrade-FAOSTAT-Comtrade)?

The reverse would not be possible because there is a many to one relationship between the UN Comtrade product codes and the codes used in FAOSTAT, i.e. Comtrade quantities need to be aggregated before they are compared with FAOSTAT values. To perform the inverse operation you would need some kind of disaggregation index, in order to split the FAOSTAT quantities between the different Comtrade codes.

paulrougieux commented 1 year ago

Community guidelines

Suggestions

An issue tracker is available on GitLab, but a template could be provided for additional user-friendliness

Thanks. biotrade/-/commit/f32ebf0d adds 2 issue templates for bugs and features. They can be selected from the drop down menu "template" when creating a new issue.

paulrougieux commented 1 year ago

Real world examples are included and instructive. However, I wonder about the rationale for including an example based on R’s reticulate package. Essentially everything that runs under python can be executed through reticulate, so an entire example may be superfluous. Perhaps just mentioning it?

Thank you for your comment. I thought it would be nice to show that reticulate actually works in our practical case, but if you think It should be removed, I can think about moving it to a section of the documentation https://bioeconomy.gitlab.io/forobs/biotrade/biotrade.html .

paulrougieux commented 1 year ago

@FATelarico

"I didn't open a pull request on GitLab because I'd prefer not to use a phone number to sign up"

Sorry about that. I didn't find how to change gitlab's security settings for pull requests. Please tell me if you know a way to enable users with lower security levels to submit pull requests.

"psycopg2-binary could be a better requirement than psycopg2"

Warning. The psycopg2 wheel package comes packaged, among the others, with its own libssl binary. This may create conflicts with other extension modules binding with libssl as well, for instance with the Python ssl module: in some cases, under concurrency, the interaction between the two libraries may result in a segfault. In case of doubts you are advised to use a package built from source.

Testing pip install biotrade-0.2.2-py3-none-any.whl in a virtual environment on my Debian laptop shows this output:

"Successfully installed biotrade-0.2.2 certifi-2023.5.7 charset-normalizer-3.1.0 comtradeapicall-1.0.20 contourpy-1.1.0 cycler-0.11.0 fonttools-4.40.0 greenlet-2.0.2 idna-3.4 importlib-resources-5.12.0 kiwisolver-1.4.4 matplotlib-3.7.1 numpy-1.25.0 packaging-23.1 pandas-2.0.2 pillow-9.5.0 psycopg2-binary-2.9.6 pymannkendall-1.4.3 pyparsing-3.1.0 python-dateutil-2.8.2 pytz-2023.3 requests-2.31.0 scipy-1.11.0 six-1.16.0 sqlalchemy-2.0.17 sqlalchemy-utils-0.41.1 typing-extensions-4.6.3 tzdata-2023.3 urllib3-2.0.3 zipp-3.15.0"

I have uploaded biotrade 0.2.2 to pypi.

paulrougieux commented 1 year ago

The data pumps return the error AttributeError: 'Engine' object has no attribute 'exec_driver_sql' when run on a machine with SQLAlchemy>=2.0.

Sorry this is going to be a bit longer to fix. I have noted this in an issue: https://gitlab.com/bioeconomy/forobs/biotrade/-/issues/133 . In the mean time, can you please test with SQLAlchemy 1.4?

paulrougieux commented 1 year ago

@FATelarico Update to SQL Alchemy version 2 was fixed for faostat.db and comtrade.db see issue https://gitlab.com/bioeconomy/forobs/biotrade/-/issues/133 fixes make the database connections explicit inside a context manager, SQL ALchemy version 2 doesn allow to pass the engine directly.

Thank you for reporting. Can you please test if the package works for you now?

paulrougieux commented 1 year ago

Hi sorry the package didn't work properly when some of the reviewers tested it above. We are 5 users who all have been working with the biotrade package for over a year inside our group (on laptop or cluster), but it wasn't tested with a fresh install. I have now fixed issues related to table creation and table drop requiring to bind the engine. Latest version on pypi https://pypi.org/project/biotrade/ should now be compatible with SQL ALchemy version 2. Please test and report if your still experience some issues while loading FAOSTAT data.

martinfleis commented 1 year ago

it wasn't tested with a fresh install

it may be useful ensuring that your CI is properly set up to catch issues like this one.

paulrougieux commented 1 year ago

Thank you for the suggestion, for the moment the CI jobs only tests python functions with pytest in biotrade/tests.

I noted your suggestion in biotrade issue 137. I hope such database tests are not a blocker for the review of the JOSS paper.

martinfleis commented 1 year ago

This is what our criteria say:

Authors are strongly encouraged to include an automated test suite covering the core functionality of their software.

Good: An automated test suite hooked up to continuous integration (GitHub Actions, Circle CI, or similar) OK: Documented manual steps that can be followed to objectively check the expected functionality of the software (e.g., a sample input file to assert behavior) Bad (not acceptable): No way for you, the reviewer, to objectively assess whether the software works

I'll let the reviewers to assess the state of the package.

FATelarico commented 1 year ago

@FATelarico

But perhaps it could be useful to highlight this imbalance between the listed individuals’ contribution.

  • Is there a policy on how to describe authors contributions?
  • Should we describe which roles the author had? The "Contributor Roles Taxonomy" describes for example roles such as Writing – original draft, Writing – review & editing, Supervision, Visualization, Conceptualization, Methodology.

There isn't a strict policy. It's mostly up to the people involved in making the software to assess what contribution suffices to be considered a co-author. However, you should also mind the difference between co-authors of the paper (whose roles you mentioned) and co-authors of the software. Not everyone involved in the writing the code must necessarily partake into writing the paper.

FATelarico commented 1 year ago

The paper's proof does not contain a final bibliography. However, both in-text references and a bibliography file are present. Perhaps it’s a minor issue in the rendering/knitting of the markdown file.

The second link view article proof on github does have a bibliography at the end. I don't know how to fix this since the automated links generated by the editorial bot are inconsistent on this point.

I saw it just now. It's fine. Just a technical issue I guess! :ok_hand:

FATelarico commented 1 year ago

@FATelarico Update to SQL Alchemy version 2 was fixed for faostat.db and comtrade.db see issue https://gitlab.com/bioeconomy/forobs/biotrade/-/issues/133 fixes make the database connections explicit inside a context manager, SQL ALchemy version 2 doesn allow to pass the engine directly.

Thank you for reporting. Can you please test if the package works for you now?

I apologise, but downgrading to v 1.4 was not an option as I have other packages that depend on newer versions. However, I tested it now and everything seems fine to me.

FATelarico commented 1 year ago

@FATelarico

"I didn't open a pull request on GitLab because I'd prefer not to use a phone number to sign up"

Sorry about that. I didn't find how to change gitlab's security settings for pull requests. Please tell me if you know a way to enable users with lower security levels to submit pull requests.

"psycopg2-binary could be a better requirement than psycopg2"

Warning. The psycopg2 wheel package comes packaged, among the others, with its own libssl binary. This may create conflicts with other extension modules binding with libssl as well, for instance with the Python ssl module: in some cases, under concurrency, the interaction between the two libraries may result in a segfault. In case of doubts you are advised to use a package built from source.

  • We also use the requests package which might depend on libssl.

Testing pip install biotrade-0.2.2-py3-none-any.whl in a virtual environment on my Debian laptop shows this output:

"Successfully installed biotrade-0.2.2 certifi-2023.5.7 charset-normalizer-3.1.0 comtradeapicall-1.0.20 contourpy-1.1.0 cycler-0.11.0 fonttools-4.40.0 greenlet-2.0.2 idna-3.4 importlib-resources-5.12.0 kiwisolver-1.4.4 matplotlib-3.7.1 numpy-1.25.0 packaging-23.1 pandas-2.0.2 pillow-9.5.0 psycopg2-binary-2.9.6 pymannkendall-1.4.3 pyparsing-3.1.0 python-dateutil-2.8.2 pytz-2023.3 requests-2.31.0 scipy-1.11.0 six-1.16.0 sqlalchemy-2.0.17 sqlalchemy-utils-0.41.1 typing-extensions-4.6.3 tzdata-2023.3 urllib3-2.0.3 zipp-3.15.0"

I have uploaded biotrade 0.2.2 to pypi.

I did not experience any issue with libssl while installing and re-testing. I guess there are cases in which this may happen. But your package seems not to be one of them!

Arguably, you should test future releases of psycopg2 to check whether the original issue stop appearing when building the package from source. In the meantime I think it's good enough as it is.

FATelarico commented 1 year ago

Hi sorry the package didn't work properly when some of the reviewers tested it above. We are 5 users who all have been working with the biotrade package for over a year inside our group (on laptop or cluster), but it wasn't tested with a fresh install. I have now fixed issues related to table creation and table drop requiring to bind the engine. Latest version on pypi https://pypi.org/project/biotrade/ should now be compatible with SQL ALchemy version 2. Please test and report if your still experience some issues while loading FAOSTAT data.

Works as a charm, thank you for looking into this!

FATelarico commented 1 year ago

Review checklist for @FATelarico v. 2.1

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Side note

I apologise if it took me so long to answer, but I am not getting notifications about this issue. I had to check on it manually. I should have fixed that now

paulrougieux commented 1 year ago

Thank you @FATelarico. In the comment above, you checked all review boxes except "Example usage".

What are your suggestions on how to improve them?

FATelarico commented 1 year ago

Thank you @FATelarico. In the comment above, you checked all review boxes except "Example usage".

Oh, sorry. I missed on that. I saw that you answered regarding my previous concerns. I just checked that box, too!

martinfleis commented 1 year ago

Thank you @FATelarico and @paulrougieux for a fruitful exchange!

@potterzot do you have any guess when you'll be able to do your review? Thanks!

potterzot commented 1 year ago

Review checklist for @potterzot

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

potterzot commented 1 year ago

Interesting package that clearly took a lot of work. I apologize for my delay in the review process. I hope the below comments are helpful! With the exception of the functionality comment they are all minor.

Specific comments

Functionality

Documentation

Paper

potterzot commented 1 year ago

Separate comment for test results. My test using pytest currently fails on four occassions. Some of these look like dependency installation problems. If the tests need additional packages it would be helpful to add those instructions in the documentation section on testing.

Here is the output of my run of pytest:

============================= test session starts =============================
platform win32 -- Python 3.11.2, pytest-7.4.0, pluggy-1.3.0
rootdir: C:\Users\potte\projects\oneoffs\fy2023-review-biotrade-joss\biotrade\biotrade-main
collected 0 items / 4 errors

=================================== ERRORS ====================================
______________ ERROR collecting biotrade/tests/test_aggregate.py ______________
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:341: in from_call
    result: Optional[TResult] = func()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:372: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:531: in collect
    self._inject_setup_module_fixture()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:545: in _inject_setup_module_fixture
    self.obj, ("setUpModule", "setup_module")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:310: in obj
    self._obj = obj = self._getobj()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:528: in _getobj
    return self._importtestmodule()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\pathlib.py:565: in import_path
    importlib.import_module(module_name)
C:\Program Files (x86)\Python\Python311\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\assertion\rewrite.py:178: in exec_module
    exec(co, module.__dict__)
biotrade\tests\test_aggregate.py:18: in <module>
    from biotrade.faostat.aggregate import agg_trade_eu_row, agg_by_country_groups
biotrade\__init__.py:97: in <module>
    if input(msg + "Please confirm [y/n]:") == "y":
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\capture.py:202: in read
    raise OSError(
E   OSError: pytest: reading from stdin while output is captured!  Consider using `-s`.
------------------------------- Captured stdout -------------------------------
Create C:\Users\potte\projects\oneoffs\fy2023-review-biotrade-joss\data?Please confirm [y/n]:
______________ ERROR collecting biotrade/tests/test_front_end.py ______________
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:341: in from_call
    result: Optional[TResult] = func()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:372: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:531: in collect
    self._inject_setup_module_fixture()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:545: in _inject_setup_module_fixture
    self.obj, ("setUpModule", "setup_module")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:310: in obj
    self._obj = obj = self._getobj()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:528: in _getobj
    return self._importtestmodule()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\pathlib.py:565: in import_path
    importlib.import_module(module_name)
C:\Program Files (x86)\Python\Python311\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\assertion\rewrite.py:178: in exec_module
    exec(co, module.__dict__)
biotrade\tests\test_front_end.py:17: in <module>
    from scripts.front_end.functions import aggregated_data, reporter_iso_codes
scripts\front_end\functions.py:15: in <module>
    from biotrade.faostat import faostat
biotrade\__init__.py:97: in <module>
    if input(msg + "Please confirm [y/n]:") == "y":
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\capture.py:202: in read
    raise OSError(
E   OSError: pytest: reading from stdin while output is captured!  Consider using `-s`.
------------------------------- Captured stdout -------------------------------
Create C:\Users\potte\projects\oneoffs\fy2023-review-biotrade-joss\data?Please confirm [y/n]:
_______________ ERROR collecting biotrade/tests/test_mirror.py ________________
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:341: in from_call
    result: Optional[TResult] = func()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:372: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:531: in collect
    self._inject_setup_module_fixture()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:545: in _inject_setup_module_fixture
    self.obj, ("setUpModule", "setup_module")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:310: in obj
    self._obj = obj = self._getobj()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:528: in _getobj
    return self._importtestmodule()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\pathlib.py:565: in import_path
    importlib.import_module(module_name)
C:\Program Files (x86)\Python\Python311\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\assertion\rewrite.py:178: in exec_module
    exec(co, module.__dict__)
biotrade\tests\test_mirror.py:13: in <module>
    from biotrade.faostat.mirror import put_mirror_beside
biotrade\__init__.py:97: in <module>
    if input(msg + "Please confirm [y/n]:") == "y":
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\capture.py:202: in read
    raise OSError(
E   OSError: pytest: reading from stdin while output is captured!  Consider using `-s`.
------------------------------- Captured stdout -------------------------------
Create C:\Users\potte\projects\oneoffs\fy2023-review-biotrade-joss\data?Please confirm [y/n]:
_____________ ERROR collecting biotrade/tests/test_reallocate.py ______________
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:341: in from_call
    result: Optional[TResult] = func()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:372: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:531: in collect
    self._inject_setup_module_fixture()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:545: in _inject_setup_module_fixture
    self.obj, ("setUpModule", "setup_module")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:310: in obj
    self._obj = obj = self._getobj()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:528: in _getobj
    return self._importtestmodule()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\pathlib.py:565: in import_path
    importlib.import_module(module_name)
C:\Program Files (x86)\Python\Python311\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\assertion\rewrite.py:178: in exec_module
    exec(co, module.__dict__)
biotrade\tests\test_reallocate.py:12: in <module>
    from biotrade.common.reallocate import allocate_by_partners
biotrade\__init__.py:97: in <module>
    if input(msg + "Please confirm [y/n]:") == "y":
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\capture.py:202: in read
    raise OSError(
E   OSError: pytest: reading from stdin while output is captured!  Consider using `-s`.
------------------------------- Captured stdout -------------------------------
Create C:\Users\potte\projects\oneoffs\fy2023-review-biotrade-joss\data?Please confirm [y/n]:
=========================== short test summary info ===========================
ERROR biotrade/tests/test_aggregate.py - OSError: pytest: reading from stdin ...
ERROR biotrade/tests/test_front_end.py - OSError: pytest: reading from stdin ...
ERROR biotrade/tests/test_mirror.py - OSError: pytest: reading from stdin whi...
ERROR biotrade/tests/test_reallocate.py - OSError: pytest: reading from stdin...
!!!!!!!!!!!!!!!!!!! Interrupted: 4 errors during collection !!!!!!!!!!!!!!!!!!!
============================== 4 errors in 1.43s ==============================
paulrougieux commented 1 year ago

In a comment below I reference the long download times. I don't know if this is possible, but it would be really nice to be able to subset data downloads by country and/or year.

I is not possible for bulk download of FAOSTAT data. These datasets are small anyway, below a few megabytes in general, rarely a few hundred megabytes. Nothing compared to any video data.

Am I correct in thinking that at the moment a new download for a new year of data would redownload all previous years as well?

Yes you understand it right. Unfortunately this is a necessary feature for international trade data because data gets updated retroactively many years backwards. As you can see for example on the Comtrade APi update https://comtrade.un.org/data/da part of the 2017 trade flows were updated in 2017 and in all subsequent years until 2023.

This is tough both on users who want to just get the most recent data and on API providers who shoulder the burden of multiple repeated downloads of the same data.

There is no way around this. Unless it would be possible to know in advance for which country the data was updated. We could have complex interactions with the API, but with a few hundred MB or a few GB it's just easier to download the whole yearly dataset twice a year.

paulrougieux commented 1 year ago

As an unwilling Windows user, I would appreciate documentation that specifies how to set parameters from windows. I tried adding the BIOTRADE_DATA environmental variable to my system environmental variables manually and exporting it via my git bash terminal before then running python, but in both cases biotrade still wants to create the data directory in $HOME/repos/forobs/biotrade_data. Some explicit windows instruction and testing that these settings work in Windows would be helpful.

Thanks for reporting, to check if an environment variable was set from python, you can call

import os
os.environ["BIOTRADE_DATA"]

I added this explanation in https://gitlab.com/bioeconomy/forobs/biotrade/-/commit/c8c062fa6bfbd475445e2a4f1c47f85ce9fa179b Explaining how to setup environment variables in Windows is beyond the scope of this documentation.

potterzot commented 1 year ago

@paulrougieux Thanks for these responses, we do something similar with agricultural time series data where some previous years may be updated. It's unfortunate that the data cannot be subset by country but as you say, it's not a huge amount of data.

If the data calls cannot be subset, I would recommend adding a note to the documentation before the examples to give the user an expectation of download time. For me with a 400mb/s download speed, that was 2+ hours. Something like "Note: data update can take on the order of a few hours depending on your connection speed." Of course for regular users this is not important, but for new users, it helps to know how long it may take to run the data update examples.

Explaining how to setup environment variables in Windows is beyond the scope of this documentation.

Thank you for adding that documentation. It's true that environmental vars on windows is a pain, and I agree this is out of scope for biotrade. I was suggesting something more like adding "For Windows users, we recommend installing git bash or another *nix-based terminal, which will allow the below examples to run. Alternatively you may directly set the environmental variables, which we do not cover here."

And I think there is still a mismatch between the documentation and the default BIOTRADE_DATA location. The documentation states that data will be stored in $HOME if BIOTRADE_DATA is not set, when actually it is stored in $HOME/repos/forobs/biotrade_data.

paulrougieux commented 1 year ago

the README suggests running the code below, the first data download of which is still running on my system after two hours. I suggest editing the examples to fetch much smaller blocks of data if possible.

In this commit, I edited the README to explain that downloading trade data can take a long time on slow connections https://gitlab.com/bioeconomy/forobs/biotrade/-/commit/15776ef60849823fcf56aae6ca60cc17bd0b7783 I also put an example to load only production data, which should be much faster. Production has N contries per year and product, while bilateral trade as NxN per year and product (more data).

paulrougieux commented 1 year ago

. The default BIOTRADE_DATA location on my system was $HOME/repos/forobs/biotrade_data, not $HOME as suggested by this line in the documentation: "By default it is located in the user's home directory, but this can be changed with the environment variable BIOTRADE_DATA."

Commit https://gitlab.com/bioeconomy/forobs/biotrade/-/commit/69ea97d6f7a6d0842ad3f37299bdf2bfc0e3f4e5 updates the documentation to explain this. The sub-directory is an artefact of how we structure our system, repositories go in "repos", then there is a project called "forobs" which contains the "biotrade_data" directory. It could be simplified, but this path makes sense for us at this point and users can change it with an environment variable. Users can also create a symbolic link if they prefer to access biotrade_data at another location.

paulrougieux commented 1 year ago

From the message, it looks like it wrote 100,000 rows to the crop_trade database. Is it writing 100,000 for each country? It's not clear what that message pertains to.

There is a default chunk size of 100 000 rows, because we had some issues loading data on low memory laptops. This is probably far too low and the default could be changed to a higher value.

paulrougieux commented 1 year ago

Line 63: I would first define HS before referencing it.

Done in https://gitlab.com/bioeconomy/forobs/biotrade/-/commit/020e1ced7c110ed5babdd408514e7e9a131909ca

paulrougieux commented 1 year ago

I agree with @FATelarico that perhaps the reference to reticulate is not necessary.

We do have R users who use this python package for data analysis. There are a lot of R users in the community who might not have tried reticulate yet. This example is meant to encourage the use of this python package from R. If you don't mind, I would like to keep the example there.

paulrougieux commented 1 year ago

I was suggesting something more like adding "For Windows users, we recommend installing git bash or another *nix-based terminal, which will allow the below examples to run. Alternatively you may directly set the environmental variables, which we do not cover here."

A bash command is not necessary to add environment variables on windows. They can be added in Control Panel Β» System Β» Advanced Β» Environment Variables. That's how I setup the machines of my colleagues, this code also runs on a Linux cluster where we set the environment variables in a bash profile. In commit https://gitlab.com/bioeconomy/forobs/biotrade/-/commit/e7d0b6b2a19b09019cd2ca0272f8f6be9b5ac190, I added a link for further reference on how to set environment variables on any OS https://superuser.com/questions/284342/what-are-path-and-other-environment-variables-and-how-can-i-set-or-use-them

paulrougieux commented 1 year ago

My test using pytest currently fails on four occassions.

@potterzot can you please provide more details on your platform. Selene told me that the tests pass on her windows laptop, she uses Anaconda to install all python modules.

The issue seems to be related to this "pytest: reading from stdin while output is captured! Consider using -s.". Can you please try running the tests with

pytest -s
SPatani91 commented 1 year ago

Hello @potterzot, the issue your are facing with pytest is probably due to biotrade package that doesn't find the location of the stored data and it requires the user input, to specify where to create the folder. Once you have set up the environment variable BIOTRADE_DATA with the path of downloaded Faostat values, the tests should work. Please let us know if you manage to run them, thanks.