How to load the IllustrisTNG snapshot in HDF5 format using yt?

yt-project / yt

Main yt repository

http://yt-project.org

Other

467 stars 276 forks source link

How to load the IllustrisTNG snapshot in HDF5 format using yt? #3049

Closed WangYun1995 closed 2 years ago

WangYun1995 commented 3 years ago

Hi, @jzuhone

I learned that yt is a powerful tool for analyzing cosmological simulation. And IllustrisTNG is one of the state-of-art hydrodynamical simulations.

However, I didn't see any example about how to load TNG snapshot and other TNG data by yt in the website of yt(https://yt-project.org/doc/index.html).

So is there any method to load TNG snapshot using yt?

welcome[bot] commented 3 years ago

Hi, and welcome to yt! Thanks for opening your first issue. We have an issue template that helps us to gather relevant information to help diagnosing and fixing the issue.

jzuhone commented 3 years ago

You can load Arepo data just like any other dataset in yt, using yt.load:

https://yt-project.org/docs/dev/examining/loading_data.html#arepo-data

The only trick is that it's not possible with the latest stable release of yt, so you'll need to install yt from source:

https://yt-project.org/docs/dev/installing.html#installing-yt-from-source

If you download a full TNG snapshot, there should be no problem. But if you download a single halo in HDF5 format from TNG, you might have to add some info to the file like this for yt to read it:

import h5py
import numpy as np
f = h5py.File(saved_filename, "r+")
f["Header"].attrs["NumPart_Total"] = np.array(f["Header"].attrs["NumPart_ThisFile"])
f.create_group("Config")
f["/Config"].attrs["VORONOI"] = 1
f.flush()
f.close()

WangYun1995 commented 3 years ago

Thank you so much for your reply!

WangYun1995 commented 3 years ago

Hello, @jzuhone

IllustrisTNG splits the snapshot data at a given redshift into many hdf5 files. So Does yt support parallel load these hdf5 files by using mpi4py?

jzuhone commented 3 years ago

Data in yt is lazy-evaluated, meaning that it's read in by chunks from the files whenever it's needed, and only the specific data which is needed (e.g. if one makes a spatial selection). For datasets like these with multiple files, one simply uses yt.load on the "0" file and the rest are brought in automatically.

yt supports parallelism with mpi4py:

https://yt-project.org/doc/analyzing/parallel_computation.html

But parallelism is not absolutely needed to support datasets with multiple files.

Most of these things are covered in the docs, I encourage you to have a look.

WangYun1995 commented 3 years ago

The snapshot data of IllustrisTNG at a given redshift contains many different particle types, such as dark matter, gas, and stars. How to load only one particle type by yt.load()?

jzuhone commented 3 years ago

@WangYun1995 as I mentioned above, yt doesn't load all of the data in when you run yt.load. It only loads basic metadata. You only get different particle data types when you explicitly try to load them. I highly recommend you check out the docs to get more info on yt basics such as this.

munkm commented 3 years ago

+1 to what @jzuhone just said! I also think the quickstart noteboooks are a nice place to look to get used to loading and visualizing data and it tries to explain in a narrative what yt does with each function as you go through the notebooks.

jzuhone commented 2 years ago

Closing this issue because I believe all of the questions have been answered. Feel free to re-open or start a new one if necessary.