Closed editorialbot closed 1 year ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.88 T=0.12 s (706.8 files/s, 108995.5 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 64 1357 3631 5554
Markdown 15 414 0 1646
TeX 1 9 0 79
YAML 2 7 1 56
SQL 1 3 3 39
-------------------------------------------------------------------------------
SUM: 83 1790 3635 7374
-------------------------------------------------------------------------------
gitinspector failed to run statistical information for the repository
Wordcount for paper.md
is 2158
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.32614/RJ-2017-019 is OK
MISSING DOIs
- 10.1126/science.abm9267 may be a valid DOI for title: Disentangling the numbers behind agriculture-driven tropical deforestation
INVALID DOIs
- None
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
ππΌ @paulrougieux, @FATelarico, @potterzot this is the review thread for the paper. All of our communications will happen here from now on.
All reviewers should create checklists with the JOSS requirements using the command @editorialbot generate my checklist
. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.
The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues (and small pull requests if needed) on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/5550
so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.
We aim for reviews to be completed within about 2-4 weeks, feel free to start whenever it works for you. Since we're approaching summer period and one of you already pointed out that they may need more time, I will wait with reminders until 6 weeks from now, so no need to rush. Please let me know if any of you require significantly more time at any point. We can also use editorialbot
to set automatic reminders if you know you'll be away for a known period of time.
Please feel free to ping me (@martinfleis) if you have any questions/concerns.
Thanks!
I would have one question regarding authorship. Looking at the list of commits on GitLab, the third Authorβs name appears only a handful of times (six, to be precise) in a limited timespan (Jan. 27 β July 29, 2022).
Surely, it is up to the authors themselves to define whether some edits are important enough to qualify someone as author. But perhaps it could be useful to highlight this imbalance between the listed individualsβ contribution.
It could be worth mentioning that buildingpsycopg2
the latter from source requires to edit the $PATH
variable. This may be easy for most Unix users, but not necessarily so for everyone and for most Windows users.
pip install biotrade Defaulting to user installation because normal site-packages is not writeable Collecting biotrade Downloading biotrade-0.1.3-py3-none-any.whl (168 kB) βββββββββββββββββββββββββββββββββββββββ 168.6/168.6 KB 1.1 MB/s eta 0:00:00 Requirement already satisfied: pandas in ./.local/lib/python3.10/site-packages (from biotrade) (2.0.1) Collecting sqlalchemy Downloading SQLAlchemy-2.0.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB) ββββββββββββββββββββββββββββββββββββββββ 2.7/2.7 MB 3.0 MB/s eta 0:00:00 [setup.txt](https://github.com/openjournals/joss-reviews/files/11778683/setup.txt) Collecting psycopg2 Downloading psycopg2-2.9.6.tar.gz (383 kB) βββββββββββββββββββββββββββββββββββββββ 384.0/384.0 KB 2.4 MB/s eta 0:00:00 Preparing metadata (setup.py) ... error error: subprocess-exited-with-error Γ python setup.py egg_info did not run successfully. β exit code: 1 β°β> [23 lines of output] running egg_info creating /tmp/pip-pip-egg-info-_0o9nlur/psycopg2.egg-info writing /tmp/pip-pip-egg-info-_0o9nlur/psycopg2.egg-info/PKG-INFO writing dependency_links to /tmp/pip-pip-egg-info-_0o9nlur/psycopg2.egg-info/dependency_links.txt writing top-level names to /tmp/pip-pip-egg-info-_0o9nlur/psycopg2.egg-info/top_level.txt writing manifest file '/tmp/pip-pip-egg-info-_0o9nlur/psycopg2.egg-info/SOURCES.txt' Error: pg_config executable not found. pg_config is required to build psycopg2 from source. Please add the directory containing pg_config to the $PATH or specify the full executable path with the option: python setup.py build_ext --pg-config /path/to/pg_config build ... or with the pg_config option in 'setup.cfg'. If you prefer to avoid building psycopg2 from source, please install the PyPI 'psycopg2-binary' package instead. For further information please check the 'doc/src/install.rst' file (also at <https://www.psycopg.org/docs/install.html>). [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed Γ Encountered error while generating package metadata. β°β> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details.
So, psycopg2-binary
could be a better requirement than psycopg2
(I didn't open a pull request on GitLab
because I'd prefer not to use a phone number to sign up, see attached setup.txt
file). In this way:
pip install "biotrade.tar.xz" Defaulting to user installation because normal site-packages is not writeable Processing ./biotrade.tar.xz Preparing metadata (setup.py) ... done Requirement already satisfied: pandas in ./.local/lib/python3.10/site-packages (from biotrade==0.1.3) (2.0.1) Requirement already satisfied: psycopg2-binary in ./.local/lib/python3.10/site-packages (from biotrade==0.1.3) (2.9.6) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from biotrade==0.1.3) (2.28.2) Collecting sqlalchemy Using cached SQLAlchemy-2.0.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB) Collecting sqlalchemy_utils Downloading SQLAlchemy_Utils-0.41.1-py3-none-any.whl (92 kB) ββββββββββββββββββββββββββββββββββββββββ 92.6/92.6 KB 1.5 MB/s eta 0:00:00 Requirement already satisfied: numpy>=1.21.0 in /usr/lib/python3/dist-packages (from pandas->biotrade==0.1.3) (1.21.5) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->biotrade==0.1.3) (2.8.2) Requirement already satisfied: tzdata>=2022.1 in ./.local/lib/python3.10/site-packages (from pandas->biotrade==0.1.3) (2023.3) Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas->biotrade==0.1.3) (2022.1) Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->biotrade==0.1.3) (3.3) Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests->biotrade==0.1.3) (2020.6.20) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3/dist-packages (from requests->biotrade==0.1.3) (1.26.5) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->biotrade==0.1.3) (3.0.1) Collecting greenlet!=0.4.17 Downloading greenlet-2.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (613 kB) ββββββββββββββββββββββββββββββββββββββββ 613.7/613.7 KB 3.8 MB/s eta 0:00:00 Collecting typing-extensions>=4.2.0 Downloading typing_extensions-4.6.3-py3-none-any.whl (31 kB) Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas->biotrade==0.1.3) (1.16.0) Building wheels for collected packages: biotrade Building wheel for biotrade (setup.py) ... done Created wheel for biotrade: filename=biotrade-0.1.3-py3-none-any.whl size=173260 sha256=64fd083e855722adda72f411c7d8a453948c332c9e263db35877fc260652b12d Stored in directory: /tmp/pip-ephem-wheel-cache-1kvnp1nb/wheels/58/a4/e2/bfeca1c223759f992ed5d0f59387e5970a10c9da94d69858b1 Successfully built biotrade Installing collected packages: typing-extensions, greenlet, sqlalchemy, sqlalchemy_utils, biotrade Successfully installed biotrade-0.1.3 greenlet-2.0.2 sqlalchemy-2.0.16 sqlalchemy_utils-0.41.1 typing-extensions-4.6.3
The data pumps return the error AttributeError: 'Engine' object has no attribute 'exec_driver_sql'
when run on a machine with SQLAlchemy>=2.0
.
There were some non-backward compatible changes after v. 1.4
, was the migration to 2.0 dealt with beforehand? Otherwise it may be a problem on my side, but I am asking to be sure.
Real world examples are included and instructive. However, I wonder about the rationale for including an example based on Rβs reticulate
package. Essentially everything that runs under python can be executed through reticulate
, so an entire example may be superfluous. Perhaps just mentioning it?
An issue tracker is available on GitLab
, but a template could be provided for additional user-friendliness
I would suggest spending a few words on the exact choices made in creating the mapping tables between different sources and how the difference in granularity between them was handled. Especially, it could be worth to discuss design choices that may be arguable/of which the user ought to be aware when using the package.
For instance, letβs say that the user converts FAOSTAT data in Comtrade format and then the reverse. The final FAOSTAT-format table would matching perfectly the starting one? If no, which categories cause the final differences? If yes, does it apply also to the inverse operation (Comtrade-FAOSTAT-Comtrade)?
The paper's proof does not contain a final bibliography. However, both in-text references and a bibliography file are present. Perhaps itβs a minor issue in the rendering/knitting of the markdown file.
@FATelarico
But perhaps it could be useful to highlight this imbalance between the listed individualsβ contribution.
The paper's proof does not contain a final bibliography. However, both in-text references and a bibliography file are present. Perhaps itβs a minor issue in the rendering/knitting of the markdown file.
The second link view article proof on github does have a bibliography at the end. I don't know how to fix this since the automated links generated by the editorial bot are inconsistent on this point.
For instance, letβs say that the user converts FAOSTAT data in Comtrade format and then the reverse. The final FAOSTAT-format table would matching perfectly the starting one? If no, which categories cause the final differences? If yes, does it apply also to the inverse operation (Comtrade-FAOSTAT-Comtrade)?
The reverse would not be possible because there is a many to one relationship between the UN Comtrade product codes and the codes used in FAOSTAT, i.e. Comtrade quantities need to be aggregated before they are compared with FAOSTAT values. To perform the inverse operation you would need some kind of disaggregation index, in order to split the FAOSTAT quantities between the different Comtrade codes.
Community guidelines
Suggestions
An issue tracker is available on
GitLab
, but a template could be provided for additional user-friendliness
Thanks. biotrade/-/commit/f32ebf0d adds 2 issue templates for bugs and features. They can be selected from the drop down menu "template" when creating a new issue.
Real world examples are included and instructive. However, I wonder about the rationale for including an example based on Rβs
reticulate
package. Essentially everything that runs under python can be executed throughreticulate
, so an entire example may be superfluous. Perhaps just mentioning it?
Thank you for your comment. I thought it would be nice to show that reticulate actually works in our practical case, but if you think It should be removed, I can think about moving it to a section of the documentation https://bioeconomy.gitlab.io/forobs/biotrade/biotrade.html .
@FATelarico
"I didn't open a pull request on GitLab because I'd prefer not to use a phone number to sign up"
Sorry about that. I didn't find how to change gitlab's security settings for pull requests. Please tell me if you know a way to enable users with lower security levels to submit pull requests.
"psycopg2-binary could be a better requirement than psycopg2"
Warning. The psycopg2 wheel package comes packaged, among the others, with its own libssl binary. This may create conflicts with other extension modules binding with libssl as well, for instance with the Python ssl module: in some cases, under concurrency, the interaction between the two libraries may result in a segfault. In case of doubts you are advised to use a package built from source.
Testing pip install biotrade-0.2.2-py3-none-any.whl
in a virtual environment on my Debian laptop shows this output:
"Successfully installed biotrade-0.2.2 certifi-2023.5.7 charset-normalizer-3.1.0 comtradeapicall-1.0.20 contourpy-1.1.0 cycler-0.11.0 fonttools-4.40.0 greenlet-2.0.2 idna-3.4 importlib-resources-5.12.0 kiwisolver-1.4.4 matplotlib-3.7.1 numpy-1.25.0 packaging-23.1 pandas-2.0.2 pillow-9.5.0 psycopg2-binary-2.9.6 pymannkendall-1.4.3 pyparsing-3.1.0 python-dateutil-2.8.2 pytz-2023.3 requests-2.31.0 scipy-1.11.0 six-1.16.0 sqlalchemy-2.0.17 sqlalchemy-utils-0.41.1 typing-extensions-4.6.3 tzdata-2023.3 urllib3-2.0.3 zipp-3.15.0"
I have uploaded biotrade 0.2.2 to pypi.
The data pumps return the error
AttributeError: 'Engine' object has no attribute 'exec_driver_sql'
when run on a machine withSQLAlchemy>=2.0
.
Sorry this is going to be a bit longer to fix. I have noted this in an issue: https://gitlab.com/bioeconomy/forobs/biotrade/-/issues/133 . In the mean time, can you please test with SQLAlchemy 1.4?
@FATelarico Update to SQL Alchemy version 2 was fixed for faostat.db
and comtrade.db
see issue https://gitlab.com/bioeconomy/forobs/biotrade/-/issues/133 fixes make the database connections explicit inside a context manager, SQL ALchemy version 2 doesn allow to pass the engine directly.
Thank you for reporting. Can you please test if the package works for you now?
Hi sorry the package didn't work properly when some of the reviewers tested it above. We are 5 users who all have been working with the biotrade
package for over a year inside our group (on laptop or cluster), but it wasn't tested with a fresh install. I have now fixed issues related to table creation and table drop requiring to bind the engine. Latest version on pypi https://pypi.org/project/biotrade/ should now be compatible with SQL ALchemy version 2. Please test and report if your still experience some issues while loading FAOSTAT data.
it wasn't tested with a fresh install
it may be useful ensuring that your CI is properly set up to catch issues like this one.
Thank you for the suggestion, for the moment the CI jobs only tests python functions with pytest in biotrade/tests.
I noted your suggestion in biotrade issue 137. I hope such database tests are not a blocker for the review of the JOSS paper.
This is what our criteria say:
Authors are strongly encouraged to include an automated test suite covering the core functionality of their software.
Good: An automated test suite hooked up to continuous integration (GitHub Actions, Circle CI, or similar) OK: Documented manual steps that can be followed to objectively check the expected functionality of the software (e.g., a sample input file to assert behavior) Bad (not acceptable): No way for you, the reviewer, to objectively assess whether the software works
I'll let the reviewers to assess the state of the package.
@FATelarico
But perhaps it could be useful to highlight this imbalance between the listed individualsβ contribution.
- Is there a policy on how to describe authors contributions?
- Should we describe which roles the author had? The "Contributor Roles Taxonomy" describes for example roles such as Writing β original draft, Writing β review & editing, Supervision, Visualization, Conceptualization, Methodology.
There isn't a strict policy. It's mostly up to the people involved in making the software to assess what contribution suffices to be considered a co-author. However, you should also mind the difference between co-authors of the paper (whose roles you mentioned) and co-authors of the software. Not everyone involved in the writing the code must necessarily partake into writing the paper.
The paper's proof does not contain a final bibliography. However, both in-text references and a bibliography file are present. Perhaps itβs a minor issue in the rendering/knitting of the markdown file.
The second link view article proof on github does have a bibliography at the end. I don't know how to fix this since the automated links generated by the editorial bot are inconsistent on this point.
I saw it just now. It's fine. Just a technical issue I guess! :ok_hand:
@FATelarico Update to SQL Alchemy version 2 was fixed for
faostat.db
andcomtrade.db
see issue https://gitlab.com/bioeconomy/forobs/biotrade/-/issues/133 fixes make the database connections explicit inside a context manager, SQL ALchemy version 2 doesn allow to pass the engine directly.Thank you for reporting. Can you please test if the package works for you now?
I apologise, but downgrading to v 1.4 was not an option as I have other packages that depend on newer versions. However, I tested it now and everything seems fine to me.
@FATelarico
"I didn't open a pull request on GitLab because I'd prefer not to use a phone number to sign up"
Sorry about that. I didn't find how to change gitlab's security settings for pull requests. Please tell me if you know a way to enable users with lower security levels to submit pull requests.
"psycopg2-binary could be a better requirement than psycopg2"
- Thanks for the suggestion, I implemented it in biotrade/commit/48df9ad. I still have a doubt about a warning on the psycopg2 install page:
Warning. The psycopg2 wheel package comes packaged, among the others, with its own libssl binary. This may create conflicts with other extension modules binding with libssl as well, for instance with the Python ssl module: in some cases, under concurrency, the interaction between the two libraries may result in a segfault. In case of doubts you are advised to use a package built from source.
- We also use the requests package which might depend on libssl.
Testing
pip install biotrade-0.2.2-py3-none-any.whl
in a virtual environment on my Debian laptop shows this output:"Successfully installed biotrade-0.2.2 certifi-2023.5.7 charset-normalizer-3.1.0 comtradeapicall-1.0.20 contourpy-1.1.0 cycler-0.11.0 fonttools-4.40.0 greenlet-2.0.2 idna-3.4 importlib-resources-5.12.0 kiwisolver-1.4.4 matplotlib-3.7.1 numpy-1.25.0 packaging-23.1 pandas-2.0.2 pillow-9.5.0 psycopg2-binary-2.9.6 pymannkendall-1.4.3 pyparsing-3.1.0 python-dateutil-2.8.2 pytz-2023.3 requests-2.31.0 scipy-1.11.0 six-1.16.0 sqlalchemy-2.0.17 sqlalchemy-utils-0.41.1 typing-extensions-4.6.3 tzdata-2023.3 urllib3-2.0.3 zipp-3.15.0"
I have uploaded biotrade 0.2.2 to pypi.
I did not experience any issue with libssl
while installing and re-testing. I guess there are cases in which this may happen. But your package seems not to be one of them!
Arguably, you should test future releases of psycopg2
to check whether the original issue stop appearing when building the package from source. In the meantime I think it's good enough as it is.
Hi sorry the package didn't work properly when some of the reviewers tested it above. We are 5 users who all have been working with the
biotrade
package for over a year inside our group (on laptop or cluster), but it wasn't tested with a fresh install. I have now fixed issues related to table creation and table drop requiring to bind the engine. Latest version on pypi https://pypi.org/project/biotrade/ should now be compatible with SQL ALchemy version 2. Please test and report if your still experience some issues while loading FAOSTAT data.
Works as a charm, thank you for looking into this!
I apologise if it took me so long to answer, but I am not getting notifications about this issue. I had to check on it manually. I should have fixed that now
Thank you @FATelarico. In the comment above, you checked all review boxes except "Example usage".
What are your suggestions on how to improve them?
Thank you @FATelarico. In the comment above, you checked all review boxes except "Example usage".
Oh, sorry. I missed on that. I saw that you answered regarding my previous concerns. I just checked that box, too!
Thank you @FATelarico and @paulrougieux for a fruitful exchange!
@potterzot do you have any guess when you'll be able to do your review? Thanks!
Interesting package that clearly took a lot of work. I apologize for my delay in the review process. I hope the below comments are helpful! With the exception of the functionality comment they are all minor.
BIOTRADE_DATA
environmental variable to my system environmental variables manually and exporting it via my git bash terminal before then running python, but in both cases biotrade
still wants to create the data directory in $HOME/repos/forobs/biotrade_data
. Some explicit windows instruction and testing that these settings work in Windows would be helpful.BIOTRADE_DATA
location on my system was $HOME/repos/forobs/biotrade_data
, not $HOME
as suggested by this line in the documentation: "By default it is located in the user's home directory, but this can be changed with the environment variable BIOTRADE_DATA."crop_trade
database. Is it writing 100,000 for each country? It's not clear what that message pertains to. Especially given that there are also messages for [1 rows x 13 columns]
.
>>> from biotrade.faostat import faostat
>>> faostat.pump.update(["crop_production", "crop_trade"])
>>> faostat.pump.update(["forestry_production", "forestry_trade", "forest_land"])
>>> faostat.pump.update(["food_balance"])
>>> faostat.pump.update(["land_use", "land_cover"])
df
parameter in several submodules (e.g. faostat) shows NaN
values throughout. Is this correct?Line 63
: I would first define HS
before referencing it. It may be that anyone reading this article will already know what HS
stands for (I assume it's a commodity code from the context but would be good to know the name).reticulate
is not necessary. From what I read in the paper, it does not look like the biotrade
package requires any special instruction to use with reticulate
, so the example is not really relevant to the biotrade
package itself. But feel free to ignore this if you disagree.Separate comment for test results. My test using pytest
currently fails on four occassions. Some of these look like dependency installation problems. If the tests need additional packages it would be helpful to add those instructions in the documentation section on testing.
Here is the output of my run of pytest
:
============================= test session starts =============================
platform win32 -- Python 3.11.2, pytest-7.4.0, pluggy-1.3.0
rootdir: C:\Users\potte\projects\oneoffs\fy2023-review-biotrade-joss\biotrade\biotrade-main
collected 0 items / 4 errors
=================================== ERRORS ====================================
______________ ERROR collecting biotrade/tests/test_aggregate.py ______________
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:341: in from_call
result: Optional[TResult] = func()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:372: in <lambda>
call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:531: in collect
self._inject_setup_module_fixture()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:545: in _inject_setup_module_fixture
self.obj, ("setUpModule", "setup_module")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:310: in obj
self._obj = obj = self._getobj()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:528: in _getobj
return self._importtestmodule()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:617: in _importtestmodule
mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\pathlib.py:565: in import_path
importlib.import_module(module_name)
C:\Program Files (x86)\Python\Python311\Lib\importlib\__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
???
<frozen importlib._bootstrap>:1178: in _find_and_load
???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
???
<frozen importlib._bootstrap>:690: in _load_unlocked
???
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\assertion\rewrite.py:178: in exec_module
exec(co, module.__dict__)
biotrade\tests\test_aggregate.py:18: in <module>
from biotrade.faostat.aggregate import agg_trade_eu_row, agg_by_country_groups
biotrade\__init__.py:97: in <module>
if input(msg + "Please confirm [y/n]:") == "y":
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\capture.py:202: in read
raise OSError(
E OSError: pytest: reading from stdin while output is captured! Consider using `-s`.
------------------------------- Captured stdout -------------------------------
Create C:\Users\potte\projects\oneoffs\fy2023-review-biotrade-joss\data?Please confirm [y/n]:
______________ ERROR collecting biotrade/tests/test_front_end.py ______________
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:341: in from_call
result: Optional[TResult] = func()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:372: in <lambda>
call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:531: in collect
self._inject_setup_module_fixture()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:545: in _inject_setup_module_fixture
self.obj, ("setUpModule", "setup_module")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:310: in obj
self._obj = obj = self._getobj()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:528: in _getobj
return self._importtestmodule()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:617: in _importtestmodule
mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\pathlib.py:565: in import_path
importlib.import_module(module_name)
C:\Program Files (x86)\Python\Python311\Lib\importlib\__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
???
<frozen importlib._bootstrap>:1178: in _find_and_load
???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
???
<frozen importlib._bootstrap>:690: in _load_unlocked
???
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\assertion\rewrite.py:178: in exec_module
exec(co, module.__dict__)
biotrade\tests\test_front_end.py:17: in <module>
from scripts.front_end.functions import aggregated_data, reporter_iso_codes
scripts\front_end\functions.py:15: in <module>
from biotrade.faostat import faostat
biotrade\__init__.py:97: in <module>
if input(msg + "Please confirm [y/n]:") == "y":
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\capture.py:202: in read
raise OSError(
E OSError: pytest: reading from stdin while output is captured! Consider using `-s`.
------------------------------- Captured stdout -------------------------------
Create C:\Users\potte\projects\oneoffs\fy2023-review-biotrade-joss\data?Please confirm [y/n]:
_______________ ERROR collecting biotrade/tests/test_mirror.py ________________
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:341: in from_call
result: Optional[TResult] = func()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:372: in <lambda>
call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:531: in collect
self._inject_setup_module_fixture()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:545: in _inject_setup_module_fixture
self.obj, ("setUpModule", "setup_module")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:310: in obj
self._obj = obj = self._getobj()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:528: in _getobj
return self._importtestmodule()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:617: in _importtestmodule
mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\pathlib.py:565: in import_path
importlib.import_module(module_name)
C:\Program Files (x86)\Python\Python311\Lib\importlib\__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
???
<frozen importlib._bootstrap>:1178: in _find_and_load
???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
???
<frozen importlib._bootstrap>:690: in _load_unlocked
???
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\assertion\rewrite.py:178: in exec_module
exec(co, module.__dict__)
biotrade\tests\test_mirror.py:13: in <module>
from biotrade.faostat.mirror import put_mirror_beside
biotrade\__init__.py:97: in <module>
if input(msg + "Please confirm [y/n]:") == "y":
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\capture.py:202: in read
raise OSError(
E OSError: pytest: reading from stdin while output is captured! Consider using `-s`.
------------------------------- Captured stdout -------------------------------
Create C:\Users\potte\projects\oneoffs\fy2023-review-biotrade-joss\data?Please confirm [y/n]:
_____________ ERROR collecting biotrade/tests/test_reallocate.py ______________
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:341: in from_call
result: Optional[TResult] = func()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\runner.py:372: in <lambda>
call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:531: in collect
self._inject_setup_module_fixture()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:545: in _inject_setup_module_fixture
self.obj, ("setUpModule", "setup_module")
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:310: in obj
self._obj = obj = self._getobj()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:528: in _getobj
return self._importtestmodule()
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\python.py:617: in _importtestmodule
mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\pathlib.py:565: in import_path
importlib.import_module(module_name)
C:\Program Files (x86)\Python\Python311\Lib\importlib\__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
???
<frozen importlib._bootstrap>:1178: in _find_and_load
???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
???
<frozen importlib._bootstrap>:690: in _load_unlocked
???
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\assertion\rewrite.py:178: in exec_module
exec(co, module.__dict__)
biotrade\tests\test_reallocate.py:12: in <module>
from biotrade.common.reallocate import allocate_by_partners
biotrade\__init__.py:97: in <module>
if input(msg + "Please confirm [y/n]:") == "y":
C:\Users\potte\AppData\Roaming\Python\Python311\site-packages\_pytest\capture.py:202: in read
raise OSError(
E OSError: pytest: reading from stdin while output is captured! Consider using `-s`.
------------------------------- Captured stdout -------------------------------
Create C:\Users\potte\projects\oneoffs\fy2023-review-biotrade-joss\data?Please confirm [y/n]:
=========================== short test summary info ===========================
ERROR biotrade/tests/test_aggregate.py - OSError: pytest: reading from stdin ...
ERROR biotrade/tests/test_front_end.py - OSError: pytest: reading from stdin ...
ERROR biotrade/tests/test_mirror.py - OSError: pytest: reading from stdin whi...
ERROR biotrade/tests/test_reallocate.py - OSError: pytest: reading from stdin...
!!!!!!!!!!!!!!!!!!! Interrupted: 4 errors during collection !!!!!!!!!!!!!!!!!!!
============================== 4 errors in 1.43s ==============================
In a comment below I reference the long download times. I don't know if this is possible, but it would be really nice to be able to subset data downloads by country and/or year.
I is not possible for bulk download of FAOSTAT data. These datasets are small anyway, below a few megabytes in general, rarely a few hundred megabytes. Nothing compared to any video data.
Am I correct in thinking that at the moment a new download for a new year of data would redownload all previous years as well?
Yes you understand it right. Unfortunately this is a necessary feature for international trade data because data gets updated retroactively many years backwards. As you can see for example on the Comtrade APi update https://comtrade.un.org/data/da part of the 2017 trade flows were updated in 2017 and in all subsequent years until 2023.
This is tough both on users who want to just get the most recent data and on API providers who shoulder the burden of multiple repeated downloads of the same data.
There is no way around this. Unless it would be possible to know in advance for which country the data was updated. We could have complex interactions with the API, but with a few hundred MB or a few GB it's just easier to download the whole yearly dataset twice a year.
As an unwilling Windows user, I would appreciate documentation that specifies how to set parameters from windows. I tried adding the BIOTRADE_DATA environmental variable to my system environmental variables manually and exporting it via my git bash terminal before then running python, but in both cases biotrade still wants to create the data directory in $HOME/repos/forobs/biotrade_data. Some explicit windows instruction and testing that these settings work in Windows would be helpful.
Thanks for reporting, to check if an environment variable was set from python, you can call
import os
os.environ["BIOTRADE_DATA"]
I added this explanation in https://gitlab.com/bioeconomy/forobs/biotrade/-/commit/c8c062fa6bfbd475445e2a4f1c47f85ce9fa179b Explaining how to setup environment variables in Windows is beyond the scope of this documentation.
@paulrougieux Thanks for these responses, we do something similar with agricultural time series data where some previous years may be updated. It's unfortunate that the data cannot be subset by country but as you say, it's not a huge amount of data.
If the data calls cannot be subset, I would recommend adding a note to the documentation before the examples to give the user an expectation of download time. For me with a 400mb/s download speed, that was 2+ hours. Something like "Note: data update can take on the order of a few hours depending on your connection speed." Of course for regular users this is not important, but for new users, it helps to know how long it may take to run the data update examples.
Explaining how to setup environment variables in Windows is beyond the scope of this documentation.
Thank you for adding that documentation. It's true that environmental vars on windows is a pain, and I agree this is out of scope for biotrade
. I was suggesting something more like adding "For Windows users, we recommend installing git bash or another *nix-based terminal, which will allow the below examples to run. Alternatively you may directly set the environmental variables, which we do not cover here."
And I think there is still a mismatch between the documentation and the default BIOTRADE_DATA
location. The documentation states that data will be stored in $HOME
if BIOTRADE_DATA
is not set, when actually it is stored in $HOME/repos/forobs/biotrade_data
.
the README suggests running the code below, the first data download of which is still running on my system after two hours. I suggest editing the examples to fetch much smaller blocks of data if possible.
In this commit, I edited the README to explain that downloading trade data can take a long time on slow connections https://gitlab.com/bioeconomy/forobs/biotrade/-/commit/15776ef60849823fcf56aae6ca60cc17bd0b7783 I also put an example to load only production data, which should be much faster. Production has N contries per year and product, while bilateral trade as NxN per year and product (more data).
. The default BIOTRADE_DATA location on my system was $HOME/repos/forobs/biotrade_data, not $HOME as suggested by this line in the documentation: "By default it is located in the user's home directory, but this can be changed with the environment variable BIOTRADE_DATA."
Commit https://gitlab.com/bioeconomy/forobs/biotrade/-/commit/69ea97d6f7a6d0842ad3f37299bdf2bfc0e3f4e5 updates the documentation to explain this. The sub-directory is an artefact of how we structure our system, repositories go in "repos", then there is a project called "forobs" which contains the "biotrade_data" directory. It could be simplified, but this path makes sense for us at this point and users can change it with an environment variable. Users can also create a symbolic link if they prefer to access biotrade_data
at another location.
From the message, it looks like it wrote 100,000 rows to the crop_trade database. Is it writing 100,000 for each country? It's not clear what that message pertains to.
There is a default chunk size of 100 000 rows, because we had some issues loading data on low memory laptops. This is probably far too low and the default could be changed to a higher value.
Line 63: I would first define HS before referencing it.
Done in https://gitlab.com/bioeconomy/forobs/biotrade/-/commit/020e1ced7c110ed5babdd408514e7e9a131909ca
I agree with @FATelarico that perhaps the reference to reticulate is not necessary.
We do have R users who use this python package for data analysis. There are a lot of R users in the community who might not have tried reticulate yet. This example is meant to encourage the use of this python package from R. If you don't mind, I would like to keep the example there.
I was suggesting something more like adding "For Windows users, we recommend installing git bash or another *nix-based terminal, which will allow the below examples to run. Alternatively you may directly set the environmental variables, which we do not cover here."
A bash command is not necessary to add environment variables on windows. They can be added in Control Panel Β» System Β» Advanced Β» Environment Variables.
That's how I setup the machines of my colleagues, this code also runs on a Linux cluster where we set the environment variables in a bash profile. In commit https://gitlab.com/bioeconomy/forobs/biotrade/-/commit/e7d0b6b2a19b09019cd2ca0272f8f6be9b5ac190, I added a link for further reference on how to set environment variables on any OS https://superuser.com/questions/284342/what-are-path-and-other-environment-variables-and-how-can-i-set-or-use-them
My test using pytest currently fails on four occassions.
@potterzot can you please provide more details on your platform. Selene told me that the tests pass on her windows laptop, she uses Anaconda to install all python modules.
The issue seems to be related to this "pytest: reading from stdin while output is captured! Consider using -s.". Can you please try running the tests with
pytest -s
Hello @potterzot, the issue your are facing with pytest is probably due to biotrade package that doesn't find the location of the stored data and it requires the user input, to specify where to create the folder.
Once you have set up the environment variable BIOTRADE_DATA
with the path of downloaded Faostat values, the tests should work.
Please let us know if you manage to run them, thanks.
Submitting author: !--author-handle-->@paulrougieux<!--end-author-handle-- (Paul Rougieux) Repository: https://gitlab.com/bioeconomy/forobs/biotrade/ Branch with paper.md (empty if default branch): Version: v0.3.0 Editor: !--editor-->@martinfleis<!--end-editor-- Reviewers: @FATelarico, @potterzot Archive: 10.5281/zenodo.8406069
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@FATelarico & @potterzot, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @martinfleis know.
β¨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest β¨
Checklists
π Checklist for @FATelarico
π Checklist for @potterzot