spaceml-org / rs_tools

https://spaceml-org.github.io/rs_tools/
Apache License 2.0
6 stars 1 forks source link

Initial setup experience #36

Open nkasmanoff opened 8 months ago

nkasmanoff commented 8 months ago

Hey, just wanted to share my experience downloading and using this repo.

Feel free to take with a grain of salt if some of these things are already planning to be changed, but this is what I went through installing the repo, and then trying to get some data.

Initial Setup

Cloned https://github.com/spaceml-org/rs_tools.git

Chose to use the development environment, just in case

Minor nit: addconda activate rs_tools just before poetry install. This wasn’t a big deal, but may confuse a user if not familiar with anaconda.

When running the poetry install, received this warning:

Warning: The file chosen for install of jupyter-client 8.5.0 (jupyter_client-8.5.0-py3-none-any.whl) is yanked. Reason for being yanked: Bug in kernel env update Don’t think this was an issue, but wanted to share if you haven’t seen.

From here I was a bit unsure what to do next. I know we discussed the download scripts, so I started there.

Going in alphabetical order, I gave the GOES downloader a try.

GOES Download

From the root dir, I tried python scripts/pipeline/goes/download_goes.py (https://github.com/spaceml-org/rs_tools/blob/main/scripts/pipeline/goes/download_goes.py). Despite using poetry, and then trying pip, there were a few packages I needed to manually install.

Once those were in I tried the script again, and got an issue about DownloadParameters being undefined (https://github.com/spaceml-org/rs_tools/blob/main/scripts/pipeline/goes/download_goes.py#L96)

To fix this I tried copying the DownloadParameters directly from download_modis.py into download_goes.py (https://github.com/spaceml-org/rs_tools/blob/main/scripts/pipeline/modis/download_modis.py#L25 ), but now found the issue where region was not specified in download (https://github.com/spaceml-org/rs_tools/blob/main/scripts/pipeline/goes/download_goes.py#L82-L99)

I took the default region, added it to the model args, and it looks like after all this, it worked and goes16 data saved!

While it started saving, which didn’t take that long, I tried to get a better understanding of the function, and the parameters.

It looks like the params I imported weren’t used anyway, and all the downloaded was done through the GOES16Download, which makes sense. https://github.com/spaceml-org/rs_tools/blob/main/scripts/pipeline/goes/download_goes.py#L105

GOES Preprocess

Tried python scripts/pipeline/goes/preprocess_goes.py, similar issue with imports,

Note these import errors came at different times, so even after I installed dask, the script ran for about a minute, until I realized the netcdf4 error came.

The .nc files saved, about 1.5 gigs each, it appears their naming convention is based on the date, but I am not really sure since the start and end date are October first to October second? (Assume the date format is YYYY-MM-DD)?

Now all 4 .nc files are saved for goes at the root directory, ideally I have them saved in some folder ready for me to analyze and process them.

I’m not familiar with these file types so I don’t know what I should do with them next, but you can ignore this comment if you think others would.

MODIS Download

python scripts/pipeline/modis/download_modis.py

This runs, but no data is found. Looks like I need to log in to earth access, but I didn’t know

I’m not sure if I have an account, but is there a way to check this before hand? A suggestion I have is to put the credentials in the .env, and put somewhere in the readme that this is needed.

2024-03-24 09:27:30.596 | INFO     | __main__:download:112 - Initializing MODIS parameters...
2024-03-24 09:27:30.596 | INFO     | __main__:download:121 - Downloading MODIS...
Downloading Terra - Date: 2020-10-01 00:00:00:   0%|                                                                                                                                        | 0/5 [00:00<?, ?it/s]Granules found: 4
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
Downloading Terra Cloud Mask - Date: 2020-10-01 00:00:00:   0%|                                                                                                                             | 0/5 [00:00<?, ?it/s]Granules found: 4
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
Downloading Terra - Date: 2020-10-02 00:00:00:  20%|█████████████████████████▌                                                                                                      | 1/5 [00:01<00:07,  1.89s/it]Granules found: 7
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
Downloading Terra Cloud Mask - Date: 2020-10-02 00:00:00:  20%|███████████████████████▍                                                                                             | 1/5 [00:02<00:07,  1.89s/it]Granules found: 7
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
Downloading Terra - Date: 2020-10-03 00:00:00:  40%|███████████████████████████████████████████████████▏                                                                            | 2/5 [00:03<00:05,  1.99s/it]Granules found: 5
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
Downloading Terra Cloud Mask - Date: 2020-10-03 00:00:00:  40%|██████████████████████████████████████████████▊                                                                      | 2/5 [00:05<00:05,  1.99s/it]Granules found: 5
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
Downloading Terra - Date: 2020-10-04 00:00:00:  60%|████████████████████████████████████████████████████████████████████████████▊                                                   | 3/5 [00:06<00:04,  2.08s/it]Granules found: 7
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
Downloading Terra Cloud Mask - Date: 2020-10-04 00:00:00:  60%|██████████████████████████████████████████████████████████████████████▏                                              | 3/5 [00:07<00:04,  2.08s/it]Granules found: 7
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
Downloading Terra - Date: 2020-10-05 00:00:00:  80%|██████████████████████████████████████████████████████████████████████████████████████████████████████▍                         | 4/5 [00:08<00:02,  2.10s/it]Granules found: 5
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
Downloading Terra Cloud Mask - Date: 2020-10-05 00:00:00:  80%|█████████████████████████████████████████████████████████████████████████████████████████████▌                       | 4/5 [00:10<00:02,  2.10s/it]Granules found: 0
Downloading Terra Cloud Mask - Date: 2020-10-05 00:00:00: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:10<00:00,  2.20s/it]

From here I chose to stop going for now, just to make sure I wasn't already veering too far off the rails. Should I be using the notebooks as well?

Thanks again :-)