transientskp / tkp

A transients-discovery pipeline for astronomical image-based surveys
http://docs.transientskp.org/
BSD 2-Clause "Simplified" License
19 stars 14 forks source link

TraP converted from Python 2 to Python 3. #596

Closed HannoSpreeuw closed 1 year ago

HannoSpreeuw commented 1 year ago

First, the converted_to_python3 branch was created by branching off master. Next, the two stages from futurize were applied. The converted_to_python3_with_sourcefinder_from_pyse_repo was created by branching off and copying in the sourcefinder from the pyse repo - master branch - which had already been converted to Python 3 by @gijzelaerr and parallellized and speeded up by @HannoSpreeuw, replacing all contents of the sourcefinder directory. Finally all broken unit tests were fixed.

All 393 unit tests pass (SKIP =22), using Python 3.6.

Fixes #371

jdswinbank commented 1 year ago

Hi @HannoSpreeuw — great that you have taken this on; really happy to see TraP make the jump to Python 3.

Unfortunately, there's no way that I can do a proper review of >1000 lines of code changes across 110 files. That's a multi-day job, and, sadly, I'm now employed to work on other things!

For changes of this type — which are in part mechanical, and in a codebase which is at least reasonably well covered by unit tests but which doesn't have a substantial active developer team — I think it is probably the best you can do to ensure all the tests pass on a clean install, and to have somebody take a quick pass through to make sure there's nothing obviously weird going on.

So on that note, I took a quick look, and I think the main thing that jumps out at me is that this PR doesn't (I think) do what it says it does. :-)

Specifically: this is billed as a conversion from Python 2 to Python 3. However, much of the code seems to be devoted to maintaining compatibility with both Python 2 and Python 3. I'm not sure if that's intentional — maybe you or others in the TKP team regard Python 2 compatibility as important. However, if it's possible to avoid this and simply go all-in on Python 3, I strongly encourage you to do that: it will make the code much more readable and maintainable for the future.

For example, take the big list of imports at the top of test_associations.py. In pure Python 3 code, you don't need the from __future__ or builtins or anything like that — they are only there to make the code work in both Py2 and Py3. But they are complicated, make things harder to read (who wants to read old_div rather than //?), and probably (marginally) slower too.

I'll also note that this change introduces a dependency on Dask, but that isn't listed in setup.py, so this won't work if installed in a clean environment (with a tip o' the hat to @dentalfloss1 at https://github.com/transientskp/pyse/issues/36). I'm guessing (but didn't actually look) that as per https://github.com/transientskp/pyse/issues/35 it also won't run on Python 3.10 or later.

HannoSpreeuw commented 1 year ago

Thanks for these comments, John, they're useful!

And yes, I did not except a long review, but including you here as a reviewer was mainly meant to inform you. Nevertheless, your remark on the word conversion which is indeed not a conversion, but keeps Python 2 compatibility, is spot on. It seemed to me the safest option, the best of both worlds and I did not think of any drawbacks when I started using futurize. Never thought it would affect maintainability, readability and performance.

@AntoniaR would you like to keep Python 2 compatibility? If not I could investigate how code accommodating for Python 2 compatibility could be removed. Not sure if this can be done in an automated way. I would not like to repeat weeks of work. Any suggestions are welcome if we go for the Python 3 only option.

The Dask dependency should be relatively easy to fix. What would the new problem be from Python 3.10 compared to Python 3.6?

jdswinbank commented 1 year ago

What would the new problem be from Python 3.10 compared to Python 3.6?

My previous comment had a typo — sorry for the confusion. I meant to link this to https://github.com/transientskp/pyse/issues/35 (rather than 36, which is what I hit by mistake).

To be explicit: as of Python 3.10, it is no longer possible to import MutableMapping from collections; it has moved to collections.abc. See https://docs.python.org/3.9/library/collections.html for details.

HannoSpreeuw commented 1 year ago

Thanks, John.

Yes, I will tackle that, but not sure if we should treat the pyse and tkp repo's in the same way, i.e. both as applications, both as libraries or one as a library and the other as an application.

I am biased towards using Python virtual environments for applications because they are so clean, but perhaps moving away from a setup.py to a Pipfile will cause issues that I do not foresee.

So two separate discussions in my opinion, here and here.

AntoniaR commented 1 year ago

As I mentioned in another issue, tkp is used as both a library and an application so creating a pipfile is not appropriate.

I would also like to keep the library functionality for PySE so that we can still experiment with various parts - e.g. substituting in a different rms map but using the PySE tools to measure the flux.

HannoSpreeuw commented 1 year ago

Wrt your first comment about the "pollution" we see e.g. in test_associations.py:

from __future__ import print_function
from __future__ import division
from builtins import zip
from builtins import str
from builtins import range
from past.utils import old_div

and how to get rid of it.....any suggestions welcome!

Currently, I see only one solution and that is to redo the whole Python 2 to 3 conversion using a different tool, e.g. regular 2to3 instead of futurize. But that would imply another couple of weeks work!

HannoSpreeuw commented 1 year ago

Redoing would mean that I would have to withdraw this pull request.

jdswinbank commented 1 year ago

Hi @HannoSpreeuw — I agree that it would not be a smart use of time to redo all the work you have done here!

If (as I strongly hope) @AntoniaR and her team are happy with dropping Python 2 compatibility, I would suggest that you can still merge this PR effectively as-is (after having resolved any outstanding questions and issues from Antonia), but that you explicitly document that Python 2 is not supported. Folks working on the code in the future can then gradually remove the “pollution” as they get around to it without worries about breaking Python 2.

It might be worth also running it through a profiler just to check that it isn't causing any major slowdowns. I'd be surprised if there are any, but I expect that function calls (old_div etc) are a bit slower than operators (/ etc) and if any of them are happening in a tight loop that could be a problem.

HannoSpreeuw commented 1 year ago

Thanks John. At the same time, I am not satisfied with this work since I think your comments on the "pollution" are fair.

I was thinking.....it is possible to revert the first 2 of these 45 commits, i.e. 9272e60d094d26609feabd1445033634f88adafd and f7c7e6c0a80c73ee832e11ea75f34e21c3462eb9, which perform the two stages from futurize i.e. python futurize --stage1 -w ... and futurize --stage2 -w ... respectively and replace them with the changes from 2to3 -w .....?

Or would that become very messy?

Btw, sorry for taking you all this time.

AntoniaR commented 1 year ago

I agree with @jdswinbank - drop python 2 compatibility, but merge as it is and slowly remove the "pollution" code at later times.

@HannoSpreeuw sounds good to me, but I'm far from an expert in git!

jdswinbank commented 1 year ago

Hey Hanno — I think rebasing from futurize to 2to3 would get pretty messy to be honest. However, the magic of git is that it's easy to try it and then to backtrack if it doesn't work I think you should be able to start from the main branch, run 2to3, then use git rebase to try replaying all your other commits on top and see if it works.

I'd suggest spending 30 minutes on that, and if it looks like you are making progress then great, otherwise I wouldn't worry about it.

HannoSpreeuw commented 1 year ago

Thanks John, got some help from a colleague with advanced git expertise. I am going to give it a try.

HannoSpreeuw commented 1 year ago

I got to Successfully rebased and updated refs/heads/converted_to_python3_with_sourcefinder_from_pyse_repo.

with little effort. I will have time to rerun the unit tests the day after tomorrow.

HannoSpreeuw commented 1 year ago

Is #294 also covered by this PR?

AntoniaR commented 1 year ago

Yes, I believe so! So when this gets merged, we can also close #294

AntoniaR commented 1 year ago

I got to Successfully rebased and updated refs/heads/converted_to_python3_with_sourcefinder_from_pyse_repo.

with little effort. I will have time to rerun the unit tests the day after tomorrow.

I think I now have everything set up for TraP in python 3.9 and tried running trap-manage.py Now I just seem to have compatibility issues between 3.6 and 3.9, so I shall wait for @HannoSpreeuw's new rebased/updated version.

HannoSpreeuw commented 1 year ago

What would the new problem be from Python 3.10 compared to Python 3.6?

My previous comment had a typo — sorry for the confusion. I meant to link this to transientskp/pyse#35 (rather than 36, which is what I hit by mistake).

To be explicit: as of Python 3.10, it is no longer possible to import MutableMapping from collections; it has moved to collections.abc. See https://docs.python.org/3.9/library/collections.html for details.

Yes, the reason for me choosing Python 3.6 three months ago was an error coming from pip install -e ".[pixelstore]" when using Python 3.10:

          from .header import Header
        File "/tmp/pip-install-rnbs5omr/astropy_0870389f05884d8180c141a05f2c46ce/astropy/io/fits/header.py", line 1940, in <module>
          collections.MutableSequence.register(Header)
      AttributeError: module 'collections' has no attribute 'MutableSequence'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I.e. combining Python 3.10 with astropy did not work, ironically for the same reason you mentioned wrt https://github.com/transientskp/pyse/issues/35. That was three months ago, so it may work now.

Therefore I tried Python 3.9. That gave a different problem.

conda install -c conda-forge python-casacore
conda install scipy astropy python-dateutil psycopg2
pip install -e ".[pixelstore]"

yielded

ERROR: Failed cleaning build dir for scipy

That is why I reverted to Python 3.6 which allowed me to run pip install -e ".[pixelstore]" without errors.

HannoSpreeuw commented 1 year ago

I tried running the unit tests python runtests.py -v from a Python 3.10 environment after the rebasing operation, but that gave a nose error:

  File "/../../anaconda3/envs/py310/lib/python3.10/site-packages/nose/suite.py", line 106, in _set_tests
    if isinstance(tests, collections.Callable) and not is_suite:
AttributeError: module 'collections' has no attribute 'Callable'

pip install nose works, but a web search showed that nose seems to be incompatible with Python 3.10.

jdswinbank commented 1 year ago

Ugh, that's nasty.

The de facto standard for “advanced” testing in Python these days is Pytest. The good news is that Pytest has support for running Nose tests, although some adaptation may be required.

(To come over all philosophical for a minute — I use “scare quotes” around “advanced” above, because I'm fundamentally pretty sceptical of all these clever testing frameworks; their advantages over the standard library unittest always feel pretty marginal, and you run the risk of them being abandoned, like Nose has been. That said, many smarter people than me really like Pytest, and it's a sufficiently big and well-maintained project that I think the risks in adopting it are small.)

HannoSpreeuw commented 1 year ago

Thanks for that suggestion.

I can second the ease of use of pytest, replacing nose for Python 3.10.

The only thing needed was to rename a few files, i.e. the filenames test_reject.py and test_fits.py occurred twice in the test suite and pytest ran into import file mismatch.

HannoSpreeuw commented 1 year ago

@jdswinbank Perhaps you have a nice suggestion on how to translate test_management.TestManagement.test_parse_no_arguments from Python 2 to Python 3.

Running the Python 2 code (current master branch) with Python 3 gives

  File "../lib/python3.10/argparse.py", line 2507, in _check_value
    raise ArgumentError(action, msg % args)
argparse.ArgumentError: argument {initproject,initjob,run,initdb,deldataset}: invalid choice: 'test_management.TestManagement.test_parse_no_arguments' (choose from 'initproject', 'initjob', 'run', 'initdb', 'deldataset')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../tkp/tests/test_management.py", line 109, in test_parse_no_arguments
    parser.parse_args())
  File "../anaconda3/envs/py310/lib/python3.10/argparse.py", line 1825, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "../envs/py310/lib/python3.10/argparse.py", line 1861, in parse_known_args
    self.error(str(err))
  File "../envs/py310/lib/python3.10/argparse.py", line 2582, in error
    self.exit(2, _('%(prog)s: error: %(message)s\n') % args)
  File "../envs/py310/lib/python3.10/argparse.py", line 2569, in exit
    _sys.exit(status)
SystemExit: 2

I thought I had fixed it. It passes python -m unittest test_management.TestManagement.test_parse_no_arguments and also python runtests.py -v, but not python runtests.py nor pytest.

jdswinbank commented 1 year ago

@jdswinbank Perhaps you have a nice suggestion on how to translate test_management.TestManagement.test_parse_no_arguments from Python 2 to Python 3.

I don't have things running locally, so I've not been able to try or reproduce this in person. Further, I think we're missing the first part of the stack trace so it's hard to see exactly where things are coming from.

However, at a guess: parser.parse_args() takes a default value from sys.argv. That test seems to assume that sys.argv is always None. In practice, I guess it can vary with the way the test is being run, so this is a bad assumption to make. I would try rewriting the test as something like:

    def test_parse_no_arguments(self):
        # should raise error if no arguments
        parser = tkp.management.get_parser()
        with nostderr():  # don't clutter test results
            with self.assertRaises(SystemExit):
                parser.parse_args([])

Does that help?

HannoSpreeuw commented 1 year ago

Thanks. That is an extra [] wrt the current code in this converted_to_python3_with_sourcefinder_from_pyse_repo branch but it did not do the trick.

It does not pass python -m unittest test_management.TestManagement.test_parse_no_arguments nor pytest while without [] it did pass the former, though not the latter.

HannoSpreeuw commented 1 year ago

Btw, I have pushed the changes from the rebasing, you should see substantially less imports now.

I also took care of some deprecation warnings from pytest and pushed the changes.

I am still facing two SAWarnings:

../tkp/tkp/db/database.py:199: SAWarning: transaction already deassociated from connection
    self.transaction.rollback()

tests/test_steps/test_varmetric.py::TestApi::test_execute_store_varmetric
tests/test_steps/test_varmetric.py::TestApi::test_execute_store_varmetric_twice
 ../tkp/tkp/db/alchemy/varmetric.py:199: SAWarning: Coercing Subquery object into a select() for use in IN(); please pass a select() construct explicitly
    return delete(Varmetric).where(Varmetric.id.in_(del_varmetrics))

Also, the new postgresql 14 causes an error described here but there is no particular reason to make a fix part of this PR.

HannoSpreeuw commented 1 year ago

Further, I think we're missing the first part of the stack trace so it's hard to see exactly where things are coming from.

Sorry, here is the full stack trace:

======================================================================
ERROR: test_parse_no_arguments (test_management.TestManagement)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/../envs/py310/lib/python3.10/argparse.py", line 1858, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/../envs/py310/lib/python3.10/argparse.py", line 2070, in _parse_known_args
    stop_index = consume_positionals(start_index)
  File "/../envs/py310/lib/python3.10/argparse.py", line 2026, in consume_positionals
    take_action(action, args)
  File "/../envs/py310/lib/python3.10/argparse.py", line 1919, in take_action
    argument_values = self._get_values(action, argument_strings)
  File "/../envs/py310/lib/python3.10/argparse.py", line 2460, in _get_values
    self._check_value(action, value[0])
  File "/../envs/py310/lib/python3.10/argparse.py", line 2507, in _check_value
    raise ArgumentError(action, msg % args)
argparse.ArgumentError: argument {initproject,initjob,run,initdb,deldataset}: invalid choice: 'test_management.TestManagement.test_parse_no_arguments' (choose from 'initproject', 'initjob', 'run', 'initdb', 'deldataset')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/../tkp/tests/test_management.py", line 109, in test_parse_no_arguments
    parser.parse_args())
  File "/../envs/py310/lib/python3.10/argparse.py", line 1825, in parse_args
    args, argv = self.parse_known_args(args, namespace) 
  File "/../envs/py310/lib/python3.10/argparse.py", line 1861, in parse_known_args
    self.error(str(err))
  File "/../envs/py310/lib/python3.10/argparse.py", line 2582, in error
    self.exit(2, _('%(prog)s: error: %(message)s\n') % args)
  File "/../envs/py310/lib/python3.10/argparse.py", line 2569, in exit
    _sys.exit(status)
SystemExit: 2    
HannoSpreeuw commented 1 year ago

Still struggling to understand what is causing this.

python -m unittest test_management.TestManagement.test_parse_no_arguments passes, as well as pytest test_management.py::TestManagement::test_parse_no_arguments as well as pytest . but not pytest

jdswinbank commented 1 year ago

Hi Hanno — When I have time I'll set up TraP on my system and try this out. Sadly, that's unlikely to be today...

jdswinbank commented 1 year ago

Hey @HannoSpreeuw — I think the commit I've just added will fix the problem you've been having in test_parse_no_arguments. Could you give it a try and let me know, please?

HannoSpreeuw commented 1 year ago

Excellent fix, pytest passes without errors:

360 passed, 22 skipped, 57 warnings

skipped comes mostly from DB management tests disabled, TKP_TESTDBMANAGEMENT not set. and from e.g. ../tests/data/SWIFT_554620-130504.fits) not available.

Wrt to the latter, FITS files like SWIFT_554620-130504.fits are stored in tests/data, but at a deeper level, in tests/data/sourcefinder or in tests/data/sourcefinder/GRB130828A I will try to fix that.

I will try to find a way to display all warnings, the warnings summary mentions three warnings of this sort:

tkp/tkp/db/quality.py:10: SAWarning: relationship 'Runningcatalog.extractedsources' will copy column runningcatalog.id to column assocxtrsource.runcat, which conflicts with relationship(s): 'Assocxtrsource.runcat' (copies runningcatalog.id to assocxtrsource.runcat). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   To silence this warning, add the parameter 'overlaps="runcat"' to the 'Runningcatalog.extractedsources' relationship. (Background on this error at: https://sqlalche.me/e/14/qzyx)
    'rms': Rejectreason(id=0, description='RMS invalid'),

We encountered this before, but it would be nice to have this fixed.

HannoSpreeuw commented 1 year ago

After the commit above, we are left with only 8 skipped unit tests.

AntoniaR commented 1 year ago

The tests passed and I have also tried out a couple of datasets. This version appears to be working well. I will now merge this pull request.