lemma-osu / gee-knn-python

0 stars 0 forks source link

Create client-side ordinations to avoid server-memory issues #4

Closed grovduck closed 11 months ago

grovduck commented 11 months ago

This PR addresses #3 by leveraging sknnr to provide estimators for this package. As noted in #3, running the estimators client-side will "interrupt" the GEE server-side flow and thus run training only once for all targets. It also removes the duplication of implementing these ordination estimators in more than one repository.

This is still very much a work in progress and I will be addressing the following issues:

As this PR turned out to be very ambitious, there are a few tasks that I decided not to tackle with this PR and turn into separate issues:

grovduck commented 11 months ago

@aazuspan, a stupid question. I thought using:

from __future__ import annotations

would allow the use of list and dict (lowercase) as type hints (i.e., from PEP 585). I'm not sure I remember how you got this to work in sknnr, but my tests are failing here and I think this is the reason. I'm using this with pydantic.BaseModel derived classes like this:

from __future__ import annotations
from typing import Any
from pydantic import BaseModel

class Geometry(BaseModel):
    """Client-side proxy for ee.Geometry object"""

    type: str
    coordinates: list[Any]

Any help would be very much appreciated!

aazuspan commented 11 months ago

Interesting! If I'm remembering right, the __future__ import works by effectively stringifying the type annotations to avoid the interpreter trying to evaluate incompatible ones. Looking at the failed test, I think pydantic is forcing those to be evaluated anyways, which is breaking the workaround.

A little short on details, but pydantic/pydantic#2112 seems to be describing the same issue. On the opposite end of the spectrum, pydantic/pydantic#2678 has a lot of detail, although I haven't read deeply enough into that discussion to figure out what the conclusion was or whether it's relevant to backwards compatibility...

grovduck commented 11 months ago

@aazuspan , thanks for your help finding those issues. I'm not sure I understand much better than I did before other than it seems I'll have issues using list and dict in pydantic classes and trying to get 3.8 to pass. If I go back to using typing.List and typing.Dict, I get all kinds of complaints from the flake8-future-annotations check. Do you have any opinion on either:

  1. dropping support for 3.8
  2. using dataclasses instead of pydantic BaseModel
  3. disabling the "FA" check

Typing is great to have, but kind of a pain as well!

aazuspan commented 11 months ago

I'd be tempted to drop 3.8... I know we considered that with sknnr and held off, but now we're 6 months past Numpy's drop date, and it would sure simplify some things. With that said, option 3 would also be quick fix, so I wouldn't hesitate to do that if you have reservations about limiting Python version support.

Typing is great to have, but kind of a pain as well!

💯

grovduck commented 11 months ago

I'd be tempted to drop 3.8... I know we considered that with sknnr and held off, but now we're 6 months past Numpy's drop date, and it would sure simplify some things

Great, thanks for your input. I've dropped 3.8 (and added 3.12) to this commit.

grovduck commented 11 months ago

@aazuspan, if you have the time (and can stomach it), I'd love for you to take a quick look at the changes here. The general thrust of this PR is to bring in the sknnr transformers [^1] into this package to do client-side fitting (training). I didn't at all hew to the best-practices of nice, small, atomic commits so it's a bit to wade through, but the general pattern follows closely to what you've done with sknnr.

There are a few issues I know of at this point:

I know it's not in your nature, but please be brutally honest if you see some funky stuff and additional opportunities for refactoring. The good thing is that we're currently passing tests!

[^1]: We only need the transformers as GEE's ee.Classifier.minimumDistance is reponsible for neighbor finding. The transformers are used to set the ordination space.

aazuspan commented 11 months ago

Absolutely, I'm excited to take a look and get a better idea of how this works! I tried running tests and it looks like I can access the table assets, but not the images. Do you mind sharing those?

FAILED tests/test_model_spatial.py::test_image_match[5-raw] - ee.ee_exception.EEException: Image.load: Image asset 'users/gregorma/gee-knn/test-check/raw_neighbors_600' not found (does not exist or caller does not have access).
FAILED tests/test_model_spatial.py::test_image_match[5-euc] - ee.ee_exception.EEException: Image.load: Image asset 'users/gregorma/gee-knn/test-check/euc_neighbors_600' not found (does not exist or caller does not have access).
FAILED tests/test_model_spatial.py::test_image_match[5-mah] - ee.ee_exception.EEException: Image.load: Image asset 'users/gregorma/gee-knn/test-check/mah_neighbors_600' not found (does not exist or caller does not have access).
FAILED tests/test_model_spatial.py::test_image_match[5-msn] - ee.ee_exception.EEException: Image.load: Image asset 'users/gregorma/gee-knn/test-check/msn_neighbors_600' not found (does not exist or caller does not have access).
FAILED tests/test_model_spatial.py::test_image_match[5-gnn] - ee.ee_exception.EEException: Image.load: Image asset 'users/gregorma/gee-knn/test-check/gnn_neighbors_600' not found (does not exist or caller does not have access).
grovduck commented 11 months ago

Do you mind sharing those?

Sorry about that. Should be shared with your gmail account now. Let me know if you have further issues.

aazuspan commented 11 months ago

All tests passing now, thanks!

grovduck commented 11 months ago

but I made a first pass at least

@aazuspan, thanks for such a thorough review. I'll be picking through it over the next couple of days (then back to sknnr!)

aazuspan commented 11 months ago

Hopefully you can see my responses to your comments.

I do see a couple, but I got an email notification about a bunch of comments that aren't showing up for me, e.g. your answer about retile or the int IDs... Not quite sure what's going on there.

grovduck commented 11 months ago

I do see a couple, but I got an email notification about a bunch of comments that aren't showing up for me, e.g. your answer about retile or the int IDs... Not quite sure what's going on there.

I really messed everything up. But I think here is what happened (chronologically):

  1. I started a review right after I initiated the PR but didn't submit it.
  2. You provided your review.
  3. I started responding to your comments, but they were in the context of my ongoing review.
  4. I finally realized that I hadn't yet submitted my initial review.
  5. All my responses to you got duplicated in my review and had no context (e.g., your original comment) in my review
  6. I deleted all my duplicated comments, making sure that they stayed when in response to your comments.
  7. Upon refresh, all my responses disappeared.

So, I'll try to go back and put in what I had written (I don't think deleted comments can be retrieved), but probably best to use the emails that were sent as the definitive source. I think the crux is that one shouldn't respond when an active review is ongoing. Sorry about this!

aazuspan commented 11 months ago

Oh no, that's a hassle to have to re-enter all your responses. I think I have the text of all your responses in the notification emails, I just don't necessarily know what they were responses to (although I can probably guess in most cases). Do you have those, or would it be helpful if I forwarded the email?

Github is a pretty smooth experience overall, but figuring out where/if review comments are going to show up seems like it's always a little bit of a gamble, so you're not alone!

grovduck commented 11 months ago

I think I have the text of all your responses in the notification emails, I just don't necessarily know what they were responses to (although I can probably guess in most cases). Do you have those, or would it be helpful if I forwarded the email?

If you have those and it would be easy enough to send, that would be great. I don't get a copy of changes I make.

grovduck commented 11 months ago

Choosing names remains the hardest part of programming! I'm not sure that staying consistent with sknnr is necessarily the perfect choice, but it does seem like a defensible one at least, and would make it easy for someone familiar with one package to adapt to the other (although if you ever tried using them in a single script there might be some namespace headaches...). I think that's where I would lean by default, just because I don't have any better ideas.

I know it's not what you suggested, but I'm considering the following name changes:

I'm thinking of this from the following perspective:

Thoughts?

aazuspan commented 11 months ago

The term isn't completely accurate, but GEE calls their estimators classifiers even if run in regression mode.

Good point!

I think your logic makes sense, and I'm happy to go with the Classifier naming scheme. As you suggested, that should be a more familiar interface for users with some Earth Engine ML experience, who are probably the most likely to find and use this.

grovduck commented 11 months ago

@aazuspan, you've been incredibly gracious with your time on reviewing this PR. Thank you so much. I'm running out of steam a bit and will migrate these issues you've identified below into separate issues.

There is still a bit of work to do, but I'd like to merge in this PR now, mostly because I need to test with the larger SERVIR workflow this coming week. I've learned that this was much too ambitious of a PR on its own!

aazuspan commented 11 months ago

Sounds great @grovduck! Thanks for helping me get up to speed on this. I'm excited to start exploring it with some real data!

grovduck commented 11 months ago

Sounds great @grovduck! Thanks for helping me get up to speed on this. I'm excited to start exploring it with some real data!

@aazuspan, even though there are still many things to do on this, I wouldn't have gotten nearly as far without your helpful reviews and fixes. I appreciate it!