sentinel-hub / field-delineation

Field delineation with Sentinel-2 data from Sentinel-Hub and a ResUnet-a architecture.
MIT License
149 stars 53 forks source link

Injected filesystem into tf_data_utils to remove tight coupling on local filesystem #2

Closed Samshal closed 3 years ago

Samshal commented 3 years ago

This allows loading datasets from S3 bucket or any where else training data may reside.

batic commented 3 years ago

Hi @Samshal

Thank you for your contribution! Could you update the Conda environment file as well?

Best, Matej

veseln commented 3 years ago

Thank you for the contribution!

Just a note, I think the changes make sense, however one must be aware of the caveat that using filesystem for loading remote data might make training slower. Locally stored npz files were used due to the fact that using remotely stored data proved to be a bottleneck when loading dataset for training, but this probably depends on the setup used, so these changes make sense.

Samshal commented 3 years ago

Thank you @batic and @veseln.

I understand it is better to train with data available on the local filesystem. But I noticed there was not much difference in terms of latency when using sagemaker + s3 and when there are very large quantity of data, it is much better to load them directly from the buckets to allow selecting portions of the data at random, this was my use case.

I'm not sure I understand how to update the conda environment because I did not introduce new packages, could you please explain what I need to do? @batic.

batic commented 3 years ago

I'm not sure I understand how to update the conda environment because I did not introduce new packages, could you please explain what I need to do? @batic.

Nevermind, forgot we're using it as well.