sbird / fake_spectra

A code for generating fake spectra from a cosmological simulation
MIT License
12 stars 13 forks source link

Feature/arepo #17

Closed sbird closed 5 years ago

sbird commented 5 years ago

A pull request to discuss the changes Xiaohan made for supporting Arepo's moving mesh better

sbird commented 5 years ago

I see. I'll try 0.5 kpc/h and see what happens. But there are two things I don't quite understand:

  1. if the forest is sensitive to gas with overdensities of ~0.1-10, then the smallest cell size in this density range is 20 ckpc/h. I'm not sure how much the densest cells contribute to the spectrum though. Especially at redshift ~5, gas with overdensities of a few already have pretty saturated absorption.
  2. the comoving softening length in the simulations is 0.5 kpc/h, so I guess 1 kpc/h is small enough?

This is the cell size distribution in my simulations. The cell sizes are simply calculated by (Mass/Volume)^1/3. screen shot 2019-02-01 at 1 40 20 pm https://user-images.githubusercontent.com/30164085/52143435-7205ad00-2629-11e9-8765-d1b06f1b3389.png

(Replying here so I can cc Matt)

The 0.1 - 10 overdensity range is true for the forest, but the three spectra you sent to me are all borderline Lyman Limit Systems, which are from higher overdensities (and physically smaller systems). It's also more true on largish scales, like k < 0.02 s/km. There are cells in your plot all the way down to 0.4 ckpc/h, and remember that if the cell is offset a bit from the sightline the chord subtended by the cell can be much less than the total size.

Because these small cells are neutral and highly clustered they can have an outsized effect on the flux power. So a smaller grid size is definitely necessary.

By the way, when you compare to observational data you need to be careful to remove the LLS and DLAs from the dataset just as the observer does.

The best argument for why this matters is found here: https://arxiv.org/abs/1504.00366

In particular: "We simulate the IGM using smoothed particle hydrodynamics, and find that, at z < 6, the gas density power spectrum does not exhibit the expected Jeans filtering cutoff, because dense gas in collapsed halos dominates the small-scale power masking pressure smoothing effects."

Simeon

xiaohanzai commented 5 years ago

The 0.1 - 10 overdensity range is true for the forest, but the three spectra you sent to me are all borderline Lyman Limit Systems, which are from higher overdensities (and physically smaller systems). It's also more true on largish scales, like k < 0.02 s/km. There are cells in your plot all the way down to 0.4 ckpc/h, and remember that if the cell is offset a bit from the sightline the chord subtended by the cell can be much less than the total size.

I checked the size distribution of cells near sightlines (selected from nearby_array, and the size is just h[ipart]). I randomly selected 10 sightlines, and the size distribution is shown below. The minimum cell size is ~10 ckpc/h, and the fraction of cells smaller than ~30 ckpc/h is less than 10%. Since 20-30 ckpc/h is about the size of a gas cell with an overdensity of 10, I’m not sure how much contribution the halo gas can have on the simulated spectra.

And I’m wondering, if we don’t sample cells with small enough sizes, does it mean the simulated power spectrum should be truncated somewhere at high k? I’ve been showing the power spectrum to k=0.2 s/km, but I’m not sure if there are enough small cells to sample this wavenumber. (Hope I made the question clear...)

screen shot 2019-02-02 at 9 29 16 pm

There was a bug in my code and I've pushed a bug fix to the branch. I tried decreasing RESO to 0.1, and the power spectrum actually didn’t change much, possibly due to the cell size distribution shown above. Please see the blue and red lines in the figure below. They use RESO=1.0 and 0.1 respectively.

screen shot 2019-02-02 at 10 20 19 am

By the way, when you compare to observational data you need to be careful to remove the LLS and DLAs from the dataset just as the observer does.

I haven’t gone through the observational paper carefully, but will it be ok to do something like Fig.1 in this paper: https://arxiv.org/pdf/0910.0256.pdf https://arxiv.org/pdf/0910.0256.pdf. Then the halo gas won’t contribute to the forest.

Xiaohan

Sorry it seems the image uploading wasn't quite successful. Let me know if the figures can't open.

On Feb 1, 2019, at 4:23 PM, Simeon Bird notifications@github.com wrote:

I see. I'll try 0.5 kpc/h and see what happens. But there are two things I don't quite understand:

  1. if the forest is sensitive to gas with overdensities of ~0.1-10, then the smallest cell size in this density range is 20 ckpc/h. I'm not sure how much the densest cells contribute to the spectrum though. Especially at redshift ~5, gas with overdensities of a few already have pretty saturated absorption.
  2. the comoving softening length in the simulations is 0.5 kpc/h, so I guess 1 kpc/h is small enough?

This is the cell size distribution in my simulations. The cell sizes are simply calculated by (Mass/Volume)^1/3. screen shot 2019-02-01 at 1 40 20 pm https://user-images.githubusercontent.com/30164085/52143435-7205ad00-2629-11e9-8765-d1b06f1b3389.png

(Replying here so I can cc Matt)

The 0.1 - 10 overdensity range is true for the forest, but the three spectra you sent to me are all borderline Lyman Limit Systems, which are from higher overdensities (and physically smaller systems). It's also more true on largish scales, like k < 0.02 s/km. There are cells in your plot all the way down to 0.4 ckpc/h, and remember that if the cell is offset a bit from the sightline the chord subtended by the cell can be much less than the total size.

Because these small cells are neutral and highly clustered they can have an outsized effect on the flux power. So a smaller grid size is definitely necessary.

By the way, when you compare to observational data you need to be careful to remove the LLS and DLAs from the dataset just as the observer does.

The best argument for why this matters is found here: https://arxiv.org/abs/1504.00366

In particular: "We simulate the IGM using smoothed particle hydrodynamics, and find that, at z < 6, the gas density power spectrum does not exhibit the expected Jeans filtering cutoff, because dense gas in collapsed halos dominates the small-scale power masking pressure smoothing effects."

Simeon — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sbird/fake_spectra/pull/17#issuecomment-459872243, or mute the thread https://github.com/notifications/unsubscribe-auth/AcxEdT0M6ShnQAmN4s0D8LvVNYGvXPhAks5vJLA1gaJpZM4adHjA.

sbird commented 5 years ago

I checked the size distribution of cells near sightlines (selected from nearby_array, and the size is just h[ipart]). I randomly selected 10 sightlines, and the size distribution is shown below. The minimum cell size is ~10 ckpc/h, and the fraction of cells smaller than ~30 ckpc/h is less than 10%. Since 20-30 ckpc/h is about the size of a gas cell with an overdensity of 10, I’m not sure how much contribution the halo gas can have on the simulated spectra.

Ok, so it's not that. Hmm. I'm still concerned by all this. I'm going to check with an old arepo simulation I have lying around.

And I’m wondering, if we don’t sample cells with small enough sizes, does it mean the simulated power spectrum should be truncated somewhere at high k? I’ve been showing the power spectrum to k=0.2 s/km, but I’m not sure if there are enough small cells to sample this wavenumber.

There will be a k above which you won't have enough resolution. The best way to do this is to explicitly check for which k values you get the same answer to within, eg, 5% if you increase the number of sightlines and if you increase the particle load (so 256 vs 512). By the way, I should have asked this before: is this a full-physics simulation, with feedback? Or is it a quick lyman alpha simulation?

By the way, when you compare to observational data you need to be careful to remove the LLS and DLAs from the dataset just as the observer does. I haven’t gone through the observational paper carefully, but will it be ok to do something like Fig.1 in this paper: https://arxiv.org/pdf/0910.0256.pdf https://arxiv.org/pdf/0910.0256.pdf. Then the halo gas won’t contribute to the forest.

Definitely don't do that! You'll be throwing away a lot of the fine detail of the simulation, and there's actually an easier way: to compare to observations you want to generate sightlines at random positions and then use only those sightlines with a total column density < 10^17 (or optical depth < ~1). So you just throw out the DLAs and LLS. Note that the specific observations you're comparing to may use a slightly different threshold. You also want to rescale to the mean flux using only the sightlines that pass the column density cut, so that you aren't being biased by DLAs. By the way, DLAs are halo gas, but LLS can be uncollapsed IGM at z=5.

sbird commented 5 years ago

And thanks for being so quick with the checks! Also good that the bug fix didn't make a big difference.

xiaohanzai commented 5 years ago

By the way, I should have asked this before: is this a full-physics simulation, with feedback? Or is it a quick lyman alpha simulation?

This is a full physics sim. We have star formation and stellar feedback, but no AGN feedback. Definitely don't do that! You'll be throwing away a lot of the fine detail of the simulation, and there's actually an easier way: to compare to observations you want to generate sightlines at random positions and then use only those sightlines with a total column density < 10^17 (or optical depth < ~1). So you just throw out the DLAs and LLS. Note that the specific observations you're comparing to may use a slightly different threshold. You also want to rescale to the mean flux using only the sightlines that pass the column density cut, so that you aren't being biased by DLAs. By the way, DLAs are halo gas, but LLS can be uncollapsed IGM at z=5.

Yeah that sounds like a good idea! But should I throw away the high column density sightlines before rescaling? In some of my simulations the original mean fluxes can be a factor of 2-3 smaller than the observed mean flux. Should I do the cut and rescaling iteratively, or perhaps it doesn’t really matter if I generate a large enough number of sightlines?

Xiaohan

On Feb 3, 2019, at 4:43 PM, Simeon Bird notifications@github.com wrote:

I checked the size distribution of cells near sightlines (selected from nearby_array, and the size is just h[ipart]). I randomly selected 10 sightlines, and the size distribution is shown below. The minimum cell size is ~10 ckpc/h, and the fraction of cells smaller than ~30 ckpc/h is less than 10%. Since 20-30 ckpc/h is about the size of a gas cell with an overdensity of 10, I’m not sure how much contribution the halo gas can have on the simulated spectra.

Ok, so it's not that. Hmm. I'm still concerned by all this. I'm going to check with an old arepo simulation I have lying around.

And I’m wondering, if we don’t sample cells with small enough sizes, does it mean the simulated power spectrum should be truncated somewhere at high k? I’ve been showing the power spectrum to k=0.2 s/km, but I’m not sure if there are enough small cells to sample this wavenumber.

There will be a k above which you won't have enough resolution. The best way to do this is to explicitly check for which k values you get the same answer to within, eg, 5% if you increase the number of sightlines and if you increase the particle load (so 256 vs 512). By the way, I should have asked this before: is this a full-physics simulation, with feedback? Or is it a quick lyman alpha simulation?

By the way, when you compare to observational data you need to be careful to remove the LLS and DLAs from the dataset just as the observer does. I haven’t gone through the observational paper carefully, but will it be ok to do something like Fig.1 in this paper: https://arxiv.org/pdf/0910.0256.pdf https://arxiv.org/pdf/0910.0256.pdf https://arxiv.org/pdf/0910.0256.pdf https://arxiv.org/pdf/0910.0256.pdf. Then the halo gas won’t contribute to the forest.

Definitely don't do that! You'll be throwing away a lot of the fine detail of the simulation, and there's actually an easier way: to compare to observations you want to generate sightlines at random positions and then use only those sightlines with a total column density < 10^17 (or optical depth < ~1). So you just throw out the DLAs and LLS. Note that the specific observations you're comparing to may use a slightly different threshold. You also want to rescale to the mean flux using only the sightlines that pass the column density cut, so that you aren't being biased by DLAs. By the way, DLAs are halo gas, but LLS can be uncollapsed IGM at z=5.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sbird/fake_spectra/pull/17#issuecomment-460091495, or mute the thread https://github.com/notifications/unsubscribe-auth/AcxEdSmzpybI1eaIItA9cSUxqngNXJn7ks5vJ1f-gaJpZM4adHjA.

xiaohanzai commented 5 years ago

Another thing I find a bit strange is that in some cases the cells seem pretty elongated. This figure shows the histogram of the offset between the position of the cell and the center of the sightline segment that belongs to the cell. I'm not sure if an offset of larger than ~20 ckpc/h is reasonable, although most cells are larger than ~40 ckpc/h. But given that the code does make the cells rounder (Section 4.1 of Springel 2010), I would naively expect almost every cell has an offset close to 0. But please correct me if I'm wrong here.

screen shot 2019-02-02 at 9 31 44 pm
sbird commented 5 years ago

That is interesting - and could explain why the mesh is giving such different answers to the SPH kernel. I think the code does make the cells rounder, so that is a bit surprising, but still possible.

sbird commented 5 years ago

This is a full physics sim. We have star formation and stellar feedback, but no AGN feedback.

Thanks

Yeah that sounds like a good idea! But should I throw away the high column density sightlines before rescaling? In some of my simulations the original mean fluxes can be a factor of 2-3 smaller than the observed mean flux. Should I do the cut and rescaling iteratively, or perhaps it doesn’t really matter if I generate a large enough number of sightlines? Xiaohan

Yes, this is tricky, because rescaling is kind of an unphysical hack in some ways (the better thing to do is to know the true UVB so that you don't have to rescale). You're trying to match the mean flux of the Lyman-alpha forest, excluding the high column density systems, so doing a cut and iteratively rescaling is ideal. That said, it shouldn't matter too much: the things that are on the margins are LLS, and because a LLS is close to saturated the flux is almost insensitive to the column density. So I would expect it to converge pretty well after just zero or one iterations.

sbird commented 5 years ago

Hi Xiaohan,

I fixed the top hat kernel in the latest git master of fake_spectra. Can you please test it and see if it is closer to your pseudo-Voronoi than the SPH kernel?

I'm having trouble reproducing your results on my snapshot: when I run this branch I get zeros for the column density and optical depth. Can you confirm that this does not happen for you?

sbird commented 5 years ago

Ok. I can't merge this as-is, because of the hard assumption of sightlines through the x-axis in index_table.cpp (and because of the merge conflicts).

But I have fixed the top hat kernel in the master branch and compared to the fixed version of this code and they agree extremely well in terms of both the flux power spectrum and the actual spectra. I made the top hat kernel the default for new arepo as well. So I think the master branch should work for you and we can consider this resolved, if you can rebase your other code on top of it.

Please confirm that master works for your snapshots as well.

sbird commented 5 years ago

I also considered reconstructing the Voronoi mesh completely in python and generating artificial spherical particles that would give the same answer to pass to the C code - but I couldn't figure out what scipy.spatial.Voronoi was doing in 3D well enough to make face intersections work and lost interest. I can share the code if you feel like doing that.

xiaohanzai commented 5 years ago

Hi Simeon,

Thank you so much for your help! Sorry for my late reply. I think the top hat kernel works very well. Here is a comparison of the spectra and the power spectrum:

screen shot 2019-02-19 at 9 11 53 am

Looks like the top hat kernel still generates slightly more small scale structure in n_HI, and the power spectrum is higher than "mesh" by 20-30% at k~0.1-0.2. But I guess the agreement is good enough.

If the scipy package is difficult to use then it's probably better for me to stick to the original pseudo method. I'll update my branch with your latest changes. Would you like to merge it or just keep it a separate branch for future reference?

Xiaohan

sbird commented 5 years ago

That looks much more reasonable! I don't object to merging this in principle - 20% is not zero. But it is a rather intrusive change to the codebase at the moment.

Could you please make a branch rebased cleanly on top of the master branch and drop the TDR stuff?

I'd also ask that you let assign_cells() work for sightlines through the y and z-axis, rather than just the x axis, and we should make changes so that the top hat and mesh kernels can both be used from the same codebase (probably by introducing a new kernel type, maybe Spectra.spec_int = 2).

Thanks for the help

sbird commented 5 years ago

One interesting thing is that the differences are overall much larger in your snapshots than in mine.

I suspect this is because of the elliptical cells you found, and I wonder if there has been some change in the Arepo codebase that makes the cell circularization less efficient? Perhaps the shift to a momentum conserving hydro solver detailed in Pakmor et al?

xiaohanzai commented 5 years ago

Could you please make a branch rebased cleanly on top of the master branch and drop the TDR stuff?

Sure, I'll try to deal with this in the next several days. I saw there is a "xiaoarepo" branch so I guess you already merged updates from master?

I'd also ask that you let assign_cells() work for sightlines through the y and z-axis, rather than just the x axis, and we should make changes so that the top hat and mesh kernels can both be used from the same codebase (probably by introducing a new kernel type, maybe Spectra.spec_int = 2).

I thought I did include cases for sightlines through y and z axis, because there is a %3 operation when extracting from the position array to calculate dy and dz in assign_cells(). Maybe I have a misunderstanding and there's something wrong in this calculation?

One interesting thing is that the differences are overall much larger in your snapshots than in mine.

I suspect this is because of the elliptical cells you found, and I wonder if there has been some change in the Arepo codebase that makes the cell circularization less efficient? Perhaps the shift to a momentum conserving hydro solver detailed in Pakmor et al?

That's interesting. We do have the mesh regularization options turned on. Let me try asking a collaborator and see if he knows about this. BTW what redshifts are your snapshots at?

sbird commented 5 years ago

Sure, I'll try to deal with this in the next several days. I saw there is a "xiaoarepo" branch so I guess you already merged updates from master?

I did, but that branch is buggy: I was trying to figure out the loading stuff at the same time and screwed up the merge through making too many changes. Better to start with a clean branch.

I'd also ask that you let assign_cells() work for sightlines through the y and z-axis, rather than just the x axis, and we should make changes so that the top hat and mesh kernels can both be used from the same codebase (probably by introducing a new kernel type, maybe Spectra.spec_int = 2).

I thought I did include cases for sightlines through y and z axis, because there is a %3 operation when extracting from the position array to calculate dy and dz in assign_cells(). Maybe I have a misunderstanding and there's something wrong in this calculation?

There is an implementation, but in this line: // take into account periodicity dx = fabs(pos[3*ipart+axis_i-1]-xp); there is a bare axis_i that worried me. Thinking it over, since axis_i = (1,2,3), not (0,1,2) this will actually be fine. But could you please test it with axis != 1 and check that it does really work before we do the merge?

One interesting thing is that the differences are overall much larger in your snapshots than in mine. I suspect this is because of the elliptical cells you found, and I wonder if there has been some change in the Arepo codebase that makes the cell circularization less efficient? Perhaps the shift to a momentum conserving hydro solver detailed in Pakmor et al?

That's interesting. We do have the mesh regularization options turned on. Let me try asking a collaborator and see if he knows about this. BTW what redshifts are your snapshots at?

These were at z ~ 3. So it might just be more important at higher redshift because of the smaller scales the 1D flux power probes. These are also very old snapshots from 2013. I don't anticipate any specific problem with the mesh being non-circular but it might be worth looking into whether it is and why. Not urgent or even important, just a thing if you have some free time and curiosity.

xiaohanzai commented 5 years ago

Hi Simeon,

I tested the y and z directions and the implementation works fine. This is the power spectrum at z=5:

screen shot 2019-02-25 at 8 51 05 am

I created a new "arepo" branch: https://github.com/xiaohanzai/fake_spectra/tree/arepo. This is based on master and should be all clean. I changed the kernel return value in abstractshapshot.py to 2 for this branch, although I guess a better way may be to add a kernel argument to the Spectra class. Please let me know if there are additional issues.

Xiaohan

sbird commented 5 years ago

Thanks! That looks really nice. I've opened a new PR so let's merge it there. A few comments wouldn't hurt but otherwise I think it looks fine to merge, thanks for your work on this feature.