Open RyanConway91 opened 2 years ago
I suddenly have the same issue, the npf takes a few minutes but the drn package takes forever. But it took around 1 minute to load the same model. Not sure what has changed. I didn't update my flopy and mf6 during this time.
For calibration I avoid loading the whole model, just modify the parameters for the package and write the model files in the right format, which saves tons of time. But when I am still at the stage of building up the model, loading is such a pain.
This is not a bug but a performance issue since it can load your model. We are working to address model load performance issues.
Some of these performance issues were addressed in a PR last week. Download the latest version of FloPy from the develop branch and follow the best practices described in this tutorial:
Note that this PR does not solve all performance problems, but can solve many of the most common ones. We are still looking into more general solutions to improve performance.
@spaulins-usgs are there any future plans for migrating to hdf5 format instead of text files for input? :)
Work is underway to explore support for netCDF and alternative modflow6 input and output data formats. Flopy would then presumably have facilities to work with these files, however this effort is still early stage.
@spaulins-usgs are there any future plans for migrating to hdf5 format instead of text files for input? :)
@sharon92 - Have you tried binary array input rather than text? Admittedly not compressed, but I see big speed increases for big arrays if reading/writing binary form.
@cnicol-gwlogic ohh does flopy support it? For the moment, my projects are based on modflow 2005 so I can't test them anyway but thanks I will keep that in mind for my new project in Modflow 6!
Both mf6 and earlier versions support binary. See here for mf6 (search page for binary): https://flopy.readthedocs.io/en/3.3.3/_notebooks/tutorial02_mf6.html
And here for earlier versions, see the example notebook on external files (sorry can't find it right now)
@cnicol-gwlogic thanks for the tip! I found it in docs now! This is amazing 🚀
@cnicol-gwlogic the doc regarding Binary data is supposed to be for output packages. For Modflow 2005 I couldnt find a way to save packages in binary.
@sharon92 - take a look here for example. So you could do something along these lines:
m = flopy.modflow.Modflow.load("freyberg.nam")
m.model_ws = "test"
m.lpf.hk.how = "external"
for k in range(m.nlay):
m.lpf.hk[k].format.binary = True
m.rch.rech.how = "external"
for kper in range(m.nper):
m.rch.rech[kper].format.binary = True
m.write_input()
@cnicol-gwlogic Thank you so much! This works great for my Project! Ii could reduce the size from 3GB to 1GB!
I wanted to test the impacts of using binary files on read/write times and model size for an MF6 model I am working with currrently. As a test, I converted some of my npf arrays to binary:
import flopy as fp
sim_ws = r'C:\offline\mcwhpp\current'
sim_ws_tmp = r'C:\offline\mcwhpp\temp'
sim = fp.mf6.MFSimulation.load(sim_ws= sim_ws)
sim.set_sim_path(sim_ws_tmp)
m_rg = sim.get_model('regional')
m_rg.npf.k.store_as_external_file(sim_ws_tmp+r'\npf_reg.bnr',binary = True)
m_rg.write()
The binary files were written, but they are about 10% larger than the ascii files. Why would this be? I though binary was supposed to drastically reduce file size? My ascii file is written to a not-super-high precision (1.000000e+00), so if the binary file is written to a much higher precision maybe that would make it larger?
@RyanConway91 that sounds a little weird to me, but I guess mf6 is double precision saves (I think), so the difference is less.
Regardless, the main saving I was talking about in this post is in read/write speeds (not disk space) - formatting text to save it to disk incurs overhead compared to binary dumps. Reading it back into memory should also provide savings for the same reasons - there is no encoding/decoding going on, and no deciphering numeric formats / parsing array elements.
@cnicol-gwlogic I encountered several issues running the new binary outputs from flopy in Modflow2005.
np.nan
in binary files. I made a check in Util2d so that np.nans are replaced by -999 if np.issubdtype(dtype, np.floating)
MODFLOW-2005
U.S. GEOLOGICAL SURVEY MODULAR FINITE-DIFFERENCE GROUND-WATER FLOW MODEL
Version 1.12.00 2/3/2017
Using NAME file: HL130513.nam
Run start date and time (yyyy/mm/dd hh:mm:ss): 2023/08/07 13:27:26
forrtl: severe (157): Program Exception - access violation
Image PC Routine Line Source
mf2005.exe 00007FF679ABF4BB ULSTRD 377 utl7.f
mf2005.exe 00007FF6795EDCF3 GWF2DRN7RP 174 gwf2drn7.f
mf2005.exe 00007FF679A433BB MAIN__ 157 mf2005.f
mf2005.exe 00007FF679B39A3E Unknown Unknown Unknown
mf2005.exe 00007FF679ACA6D3 Unknown Unknown Unknown
KERNEL32.DLL 00007FFAE7D826AD Unknown Unknown Unknown
ntdll.dll 00007FFAE8DEAA68 Unknown Unknown Unknown
I couldn't figure out a solution to this one. I checked them against nan values:
@sharon92 - sounds a bit buggy doesn't it. I haven't encountered this, and use it regularly with evt and rch arrays. The drn file issue seems weird, assuming you are not playing with external files/binary on it. I'd have to have a look at your files to be of any help sorry.
@cnicol-gwlogic I am translating a mf6 model into a mfusg model. I used external binary files for most packages (NPF, STO, RIV, GHB, RCH, EVT...) and external text files for TVK and TVS, all worked fine. However, when I set up the packages in the mfusg model, It seems that binary file is not supported in the boundary packages that use list data, such as RIV and GHB?
# RIV package for MODFLOW-USG generated by Flopy 3.5.0.dev0
6424 50
6424 0 # stress period 1
OPEN/CLOSE riv\riv_sp_1.bin (BINARY)
-1 0 # stress period 2
-1 0 # stress period 3
-1 0 # stress period 4
-1 0 # stress period 5
-1 0 # stress period 6
...
I got this error in the model list file, apparently mfusg tries to read the binary file as a text file.
If binary is not supported for list data, I would just use text files for these packages. I do find binary files are faster.
Hi @hjia1005,
I'm not sure re: list type datasets to be honest - you're probably best off asking Sorab on that one. But it looks to me like, at least when using the OPEN/CLOSE method, that it is not supported (maybe you could try the "EXTERNAL" method instead, and add to your nam file as a binary file - but if you are doing that per stress period, it could be a bit of a beast nam file...
Quick glance at the usg code and I do not think binary list files are supported. Could be wrong though - check with Sorab.
Chris.
Thank you. I tried EXTERNAL and messed up with the file unit numbers. Text files are fine for these packages. As long as RCH and ETV are treated as arrays in mfusg then I can use binary files on them. These are the large datasets.
BTW, the translated mfusg model is much faster than mf6 model in my case. But there are large discrepancies in both heads and water buget.
Yep, list stuff too small / not worth the hassle. That's a worry re: those differences...there must be something fundamentally different in the settings for that to occur, I would have thought.
Yes I believe there should be something very different that I haven't captured yet. I have been digging into it, so far no clue. But I find that even using different numbers of timesteps in stress periods while keeping other conditions unchanged can cause a difference in the final heads as large as 45 meters in this model. I tried this with mf6, not sure if mfusg would be the same.
A little more details about the translation are given here - [https://github.com/modflowpy/flopy/discussions/1997]
Yep, list stuff too small / not worth the hassle. That's a worry re: those differences...there must be something fundamentally different in the settings for that to occur, I would have thought.
@cnicol-gwlogic Hi Chris, I tested the code you posted earlier in this thread, see below, I just changed how = "external "
to how = "openclose"
.
m = flopy.modflow.Modflow.load("freyberg.nam")
m.model_ws = "test"
m.lpf.hk.how = "openclose"
for k in range(m.nlay):
m.lpf.hk[k].format.binary = True
m.rch.rech.how = "openclose"
for kper in range(m.nper):
m.rch.rech[kper].format.binary = True
m.write_input()
It worked well for the RCH
package but doesn't work for the LPF
package. Not sure if it is because the mfusg
model adds some complications to the array shape. I used the freyberg.usg model to test. Alternatively, to avoid loading the model using flopy
in case of a big model, Utils2d.write_bin( )
can be used to write the binary file but again it only worked for RCH
. mfusg
won't accept the generated binary file for arrays in LPF
. It doesn't cause an issue for me since those arrays will only be written once, text files are totally fine. Just curious.
I'll take a look and get back to you. I've not used binary for lpf, but it should work as it's the same utils reading them.
Describe the bug Might not be a bug, just an fyi in case it is. I have a model that uses tvk. The tvk input data is rather large (32 SPs, about 200000 nodes change per SP). When I load the model using:
it takes around 20 minutes. If I just use numpy or pandas or something to load all the arrays into a database, it take about 30 seconds. This is annoying during calibration, as loading the model in flopy is taking longer the rest of the run combined.
To Reproduce Make a model with a large TVK and load the npf
Expected behavior I would expect loading the tvk into a flopy class not to take orders of magnitude longer than just reading the data another way. Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):