sunpy / sunpy

SunPy - Python for Solar Physics
http://www.sunpy.org
BSD 2-Clause "Simplified" License
912 stars 586 forks source link

Working with .npz files #4447

Closed Flock1 closed 4 years ago

Flock1 commented 4 years ago

Description

A lot of data provided by SDO is in .npz format. I see that (or at least till now with whatever I have explored), there's no way to work with that file. So is there some way I can import that data using sunpy? I didn't know where else to post this query since it's not a bug or an error.

Please let me know.

nabobalis commented 4 years ago

Do you have an example of these files? I was unaware that SDO data came in npz files.

You should be able to load them using numpy.load and then create a sunpy map directly but I am unsure about the header information for these files.

wtbarnes commented 4 years ago

Are you referring to the ML dataset (e.g. https://purl.stanford.edu/vk217bh4910) as described in Galvez et al, (2019)?

Note that these data are not provided by any of the instrument teams and are meant as a resource specifically for machine learning. They shouldn't be regarded as official instrument data products. In the case of AIA, they have been heavily downsampled from their original resolution (4096-by-4096 to 512-by-512). Additionally, they contain no header information so I do not think we could ever support these as as valid inputs to Map directly.

As Nabil points out above, you can of course load these just using Numpy and then create a Map object "by hand." If you knew the original image from which your reduced image was created, you could grab the metadata from that file and then create a new header for the reduced image using make_fitswcs_header

PaulJWright commented 4 years ago

To follow on from Will and Nabil, you can find instructions on how to use this dataset at https://github.com/dfouhey/sdodemo.

Flock1 commented 4 years ago

@nabobalis, yeah. header information is definitely missing from that dataset. And I guess as @wtbarnes mentioned, the data I have is different from the original data. I have images that are of shape 1024X1024 and I am trying to use these images for machine learning.

So if I am interested in using ML for this dataset, how do you recommend I should use sunpy library? One major application of this library is the grid functionality, which will definitely help in giving a sense of a sphere instead of a circle. My astronomy knowledge is very limited hence kindly don't mind the naiveness of my questions.

nabobalis commented 4 years ago

@Flock1 sunpy maps need metadata to be created, these normally come from the data files that are used in astronomy. If these are missing you can create them manually (following https://docs.sunpy.org/en/stable/generated/gallery/map/map_from_numpy_array.html#sphx-glr-generated-gallery-map-map-from-numpy-array-py) and create sunpy maps but I am not sure how useful that will be.

If you need to ML (I do not have any experience in ML), don't you just want the raw data?

PaulJWright commented 4 years ago

@Flock1, we excluded this header information by design.

I don't fully understand why these need to be converted in to sunpy maps, but you can learn more about various coordinate systems here: https://fits.gsfc.nasa.gov/wcs/coordinates.pdf. Understanding what problem you're trying to tackle would be helpful, I think.

Flock1 commented 4 years ago

@nabobalis, thank you for the link. I will have a look at it.

Raw data is fine but it hasn't been much of use for me till now, especially when it comes to solar flares. But I think I'll need to read some text about this field to get a good grasp.

@PaulJWright, thank you for replying. One problem that I was thinking of is to detect solar flares through some unsupervised learning and predicting future solar flares. What do you suggest?

PaulJWright commented 4 years ago

The data set you're referring to has co-aligned, co-temporal coronal (AIA), magnetic field (HMI) and integrated spectra (EVE). This data set does not include a flare catalog.

You would first need to query a catalog such as the Heliophysics Event Knowledgebase (HEK, https://docs.sunpy.org/en/stable/guide/acquiring_data/hek.html) to locate flares in this ML data set. Because there is 20PB of SDO data, we reduced the cadence significantly. This may not be appropriate for flare prediction, but the code that created this ML data set is on Github (https://github.com/SDOML/SDOML) and in theory you can create it yourself at a higher cadence.

Personally, if you are interested in learning more about the field, or want to quickly apply ML to the problem, I would recommend trying one of the example cases listed in Galvez et al (2019). For example, you could try infer coronal images (AIA) from just the magnetic field data (obtained by HMI). The data in the ML data set is in the right format to be fed straight in to a CNN, and would just need to be loaded with numpy

Flock1 commented 4 years ago

@PaulJWright, this is great. Thank you for the detailed response. I will try the example you mentioned.