nasaharvest / presto

Lightweight, Pre-trained Transformers for Remote Sensing Timeseries
https://arxiv.org/abs/2304.14065
MIT License
188 stars 31 forks source link

Issue with Files Needed for CropHarvestMultiClassValidation Class #38

Open mahrokh3409 opened 4 months ago

mahrokh3409 commented 4 months ago

Hi @gabrieltseng

I am currently working on implementing the CropHarvestMultiClassValidation class within presto/eval/cropharvest_eval.py. To facilitate this, I require access to the data accessible via the download_cropharvest_data() function.

However, I am encountering difficulties accessing the "features/dynamic_world_arrays" and "test_dynamic_world_features" files necessary for this task. Could you please provide me with direct links or alternative methods to download these folders?

Your assistance in resolving this access issue would be greatly appreciated.

Kind Regards, Mahrokh

mahrokh3409 commented 4 months ago

Hi @gabrieltseng , @kvantricht , @rubencart , and @sabman I am currently working on implementing the CropHarvestMultiClassValidation class within presto/eval/cropharvest_eval.py. To facilitate this, I require access to the data accessible via the download_cropharvest_data() function.

However, I am encountering difficulties accessing the "features/dynamic_world_arrays" and "test_dynamic_world_features" files necessary for this task. Could you please provide me with direct links or alternative methods to download these folders?

Your assistance in resolving this access issue would be greatly appreciated.

Kind Regards, Mahrokh

gabrieltseng commented 4 months ago

Hi @mahrokh3409 ,

The dynamic world data needs to be re-exported from Google Earth Engine. This can be done by calling the export_dynamic_world function in the CropHarvestEval task.

You then need to transform the tif files you receive from EarthEngine into npy arrays - this can be achieved via the dynamic_world_tifs_to_npy function in the CropHarvestEval task.

For the test data, you will need to use the create_dynamic_world_test_h5_instances function.

So the flow is:

  1. Export tifs from EarthEngine
  2. Download them from your google cloud
  3. Transform them into npy and h5 files

I hope this helps!

mahrokh3409 commented 4 months ago

Hi @gabrieltseng,

Thanks for your response. My main issue is related to the download function and access to the bucket on Google Cloud. It has a permission error.

def download_cropharvest_data(root_name: str = ""): root = Path(root_name) if root_name != "" else cropharvest_data_dir() if not root.exists(): root.mkdir() CropHarvest(root, download=True) for gcloud_path in ["features/dynamic_world_arrays", "test_dynamic_world_features"]: if not (root / gcloud_path).exists(): blob = ( storage.Client().bucket(TAR_BUCKET).blob(f"eval/cropharvest/{gcloud_path}.tar.gz") ) blob.download_to_filename(root / f"{gcloud_path}.tar.gz") extract_archive(root / f"{gcloud_path}.tar.gz", remove_tar=True)

The export_dynamic_world function also calls the above function to download files. Files related to CropHarvest are downloaded successfully and I have access to "features" and "test_features" data. However, the second part of the code (highlighted parts) generates an error as you can see below:

Forbidden: 403 GET https://storage.googleapis.com/download/storage/v1/b/lem-assets2/o/eval%2Fcropharvest%2Ffeatures%2Fdynamic_world_arrays.tar.gz?alt=media: mahrokh3409@gmail.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)

Can you please let me know how I can access those files? Is there any other way of accessing those files?

I appreciate your help.

Kind Regards, Mah

mahrokh3409 commented 4 months ago

image

gabrieltseng commented 4 months ago

Hi @mahrokh3409 , this is expected. Did you go through the steps to download the export the data from Earth Engine into a google cloud project (as described above)? If not you will not have any data to download.

Google Cloud Bucket names are globally unique, so you will need to change the bucket / folder names being exported to.

These are defined in the following places:

mahrokh3409 commented 4 months ago

Dear @gabrieltseng

Thank you so much for your response and detailed guidance. I will follow the instructions and let you know if there are any problems

Kind Regards, Mahrokh

mahrokh3409 commented 4 months ago

Hi @gabrieltseng

I tried below steps: I created three buckets as below image

and I updated the related files with bucket names:

• Presto/presto/dataops/pipelines/ee_pipeline.py

image

• Presto/presto/dataops/dataset.py

image

Then I called export_dynamic_world via below code import ee ee.Authenticate() ee.Initialize() CropHarvestEval.export_dynamic_world(test=False)

below is the screenshot image

there was no error, however, as you can see the dynamic world folder is empty image

I am really stuck at this stage and not sure what steps I should take to obtain those files. I appreciate your help

Kind Regards, Mahrokh