lofar-astron / RMextract

extract TEC, vTEC, Earthmagnetic field and Rotation Measures from GPS and WMM data for radio interferometry observations
GNU General Public License v3.0
31 stars 22 forks source link

Add local file option and formatter #48

Closed AlecThomson closed 4 months ago

AlecThomson commented 5 months ago

Allows local file support (i.e. if you wget a bunch of IONEX data for offline use).

Adds a formatter for the ever-changing IONEX formats.

Also some tweaks to typing for dev use

Closes #46

AlecThomson commented 4 months ago

@maaijke - I've reworked a few things, which should address your comments. I've also added some more standard use of pathlib and urllib. Although, I haven't been able to do this throughout the entire codebase. The former makes life much easier for dealing with paths (over using strings and os.path).

I've added a new formatters submodule, which should lay things out more explicitly for a user. I've also dropped the redundant yy from the format.

On the change of filename extension, I think it should be easy enough to do in the new submodule. For example (that I'm making up), if the extensions change after 2025 we could add something like:

def ftp_aiub_unibe_ch(server: str, year: int, dayofyear: int, prefix: str) -> str:
    """
    Format the URL for the ftp.aiub.unibe.ch server
    """
    ext = ".Z" if year <= 2024 else ".gz"
    return f"{server}/CODE/{year:4d}/{prefix.upper()}{dayofyear:03d}0.{year%100:02d}I{ext}"
AlecThomson commented 4 months ago

Oh, I also did a quick bit of autoformatting to make my IDE happy. Sorry if this touched a few files. This mostly fixed up the order of imports (compliant with isort), so shouldn't change any actual code behaviour.

gmloose commented 4 months ago

One thing that bothers me a bit, but is unrelated to your changes is the dependency on PySocks. That package seems to be unmaintained. The latest release was made in 2019, and I read somewhere that people already run into issues with Python 3.10. I think it should be replaced with something more modern (don't know yet which package that would be), or we should fall back to the socket package from the Python standard library.

AlecThomson commented 4 months ago

You made a lot of good improvements. This is very much appreciated, because the code needed some extra love and care. I made a few minor remarks.

I saw in the email conversation you also suggested to use a linting tool like ruff, and to use more rigorous code formatting (I would suggest to use black for that). I can only support that. Also, I think the package desperately needs tests. I consider that even more important than proper formatting.

Please feel free to become more closely involved in improving RMextract.

Thanks @gmloose! Happy to help out where I can!

A few top-level thoughts and some additional context. My changes here started from using RMextract in a) a highly parallel manner, and b) an environment where FTP was blocked on compute nodes. This is what lead me to adding the formatting option (allowing for easier offline file access) and catching some race conditions. I'll admit that perhaps this PR has grown a little larger than that narrower set of objectives. For the most part, I tried not to touch any of the actual logic in the existing code.

@maaijke mentioned that some more major reworks are potentially in order as well, which added some more motivation for me not to change much of the logical code - rather I tried to make the existing logic clearer for any devs / contributors (starting with myself!). I'm happy to provide some suggestions on how a rework might look - but I can't commit the time to the actual implementation. Plus, I think there'd need to be clear agreement from the core maintainers about how to go about that.

I'm happy to start adding tests for the functions I've 'touched' along the way. I've found this to be the easiest way to get unit tests in for a large project that has grown without them. Some less unitary end-to-end tests are useful as well. Perhaps after this PR merge we can start looking at creating those frameworks.

On formatters, I'm also a fan of Black. However, I've recently made the move to Ruff as it seems to be where the wider Python community is moving - and the completely replicates the feature set with greater speed.

One thing that bothers me a bit, but is unrelated to your changes is the dependency on PySocks. That package seems to be unmaintained. The latest release was made in 2019, and I read somewhere that people already run into issues with Python 3.10. I think it should be replaced with something more modern (don't know yet which package that would be), or we should fall back to the socket package from the Python standard library.

I added socks to the setup.py as it was already a silent dependency in some of the functions. However, I think it went unnoticed since it was buried in proxy server logic.

gmloose commented 4 months ago

One thing that bothers me a bit, but is unrelated to your changes is the dependency on PySocks. That package seems to be unmaintained. The latest release was made in 2019, and I read somewhere that people already run into issues with Python 3.10. I think it should be replaced with something more modern (don't know yet which package that would be), or we should fall back to the socket package from the Python standard library.

Funny. The well-known and often-used requests package added support for socks in version 2.10. And guess what..., for this it depends on the third-party package PySocks. 🤔

gmloose commented 4 months ago

@maaijke, do you want to press the Merge button? Please do not forgot to select "Squash and merge".

I think I need one more change to make sure the Linc interface script is still working. Let me add that and then we can go for it. Can we add Alec as a contributor to the main page?