How do I replace my own data for training?

yukun80 commented 2 years ago

Hello, I was recently reading and studying your article and downloaded your code for testing, I would like to ask if it is possible to train time series in .tif format data? (Sentinel-1 data) How can I do this? Thank you very much!

olliestephenson commented 2 years ago

Hi, thanks for your interest in the code. This depends on how the data has been processed. Which software have you used to process your data? Do you have a single .tif file that contains a series of Sentinel-1 coherence images that are all coregistered to the same grid? If so, the most straightforward approach would be to open the .tif file in python, write it as a numpy array, then use the code as stated in the documentation. You could also make a small modification to the code in order to load a .tif file rather than a .npy file, which would save you having to duplicate the data.

If the Sentinel-1 data are in separate .tif files that aren't registered to the same grid, then this is less straightforward. You first need to resample everything so that each pixel corresponds to the same geographic region on the ground.

Using gdal is probably the best way to interact with .tif files in python. See: https://www.geeksforgeeks.org/opening-tif-file-using-gdal-in-python/

yukun80 commented 2 years ago

Thank you very much for your reply! My tif data is already on the same grid, as shown below. Sorry maybe I'm not understanding the code correctly, I noticed when running the code using the original test data.npy file that the run seems to be using a single 100×100 matrix, so should I replace the data each time I replace the data for training after replacing it with the next time? Another question is that my tif data resolution is about 5000×3000, do I need to crop it to 100×100? Thanks again for your reply!

yukun80 commented 2 years ago

I'm not sure if my presentation is clear, so I wanted to add to it. My main confusion with the code at the moment is focused on how my tif dataset has to be used in the code to reflect this feature of the training of the time series? Thanks a lot!

olliestephenson commented 2 years ago

The training data should be a 3 dimensional matrix with dimensions (space, space, number of time steps). So if you have 100 coherence pairs, then for your data the training data would have the dimensions (5000, 3000, 100). In the code we refer to the spatial dimensions as the shape and the temporal dimension as the length. So in the dataset.json file you would put "shape": [5000, 3000] and "length": 100. You can see that the test_dataset.npy file has dimensions (100, 100, 10), so the shape is (100, 100) and the length is 10.

In order to use the data that you have, you need to take each coherence image and stack it into a single file, with the stacking in time along the third axis. The data is then loaded into the code here: https://github.com/olliestephenson/dpm-rnn-public/blob/bc4247fd7126eac66dd07c50149a62be72a316ec/coherence_timeseries.py#L30, so you could also rewrite the code here in order to be able to load multiple separate images.

Data can have any shape - 100x100 is probably much too small for good training, we just have that there so you can try running the coe.

olliestephenson / dpm-rnn-public

How do I replace my own data for training? #1