Open mikeskaug opened 5 months ago
@blumenstiel I was wondering if you have any input on these questions. I can still make progress on other things but I want to confirm that I haven't made a bad assumption about these before getting to large scale training.
- You probably get better results if you keep the high-res data. You can either change the input size of the model to 1024 as the positional embedding is just computed and you have to delete the pos embeddings weights anyway before loading (which would increase your memory requirements during training). Alternatively, you could split the data in 16 256x256 samples.
- I got the best results with simply dropping the IR channels. Filling them with zeros often leads to worse results. Be aware that during weight loading you have to change the order of the first three channels as Prithvi is trained in the order BGR.
- The band names between S2 and Landsat are not different. We used the Landsat names and B05, B06 and B07 have a similar wave length as B8A, B11 and B12 from S2.
- I don't know which scaling would work best. By default, I would use the values from the pre-training (values of the pre-training training data). The training data is in reflectance, so you need to scale them to uint8 for the RGB data. But I am not sure if a scaling with value / 10000 * 255 works good or if the xBD values would work better. Maybe try both.
Interesting, thanks. I will try your suggestions.
I'm still surprised by how the model can adapt to such different input than the pre-training data. I guess I need to dig into the model more and transformers in general!
With high res data it does perform not as good as with low res compared to general vision models. But I assume that it can benefit from the temporal pre-training for this use case which is missing in other models.
Hi all. I'm still in the beginning stages of exploring the data and getting things setup to train a segmentation head on top of the pre-trained Prithvi encoder. There is still a lot I need to do before I'm ready to start training and evaluating, but I have a few questions about preparing the input data that you might be able to help me with.