Closed zjwegert closed 9 months ago
Is there a reason we are using CSV instead of an HDF5-compatible format like JLD2? With the later we might be able to save both the array itself and the inidices, which would allow for the read to be bang-less. Would we prefer that?
I hadn't considered going with JLD2 (only used it a few times), I think this might be a nice solution. I assume that we'd still have to generate several files?
I assume that we'd still have to generate several files?
Yes, the MPI communicator cannot be saved, I'm afraid.
I've also been thinking we might not want to save the indices... Not because we cannot, but rather because a lot of times we are using the ===
to check if indices are the same. I am not sure the object id is the same if you save then read multiple copies of it, even in binary format. I have to think about it.
In any case HDF5 might still be a better format for saving/loading data fast.
@JordiManyer, I've adjusted to use JLD2.jl
and added load_object!
to copy file data to a PVector
or PSparseMatrix
. I haven't had a chance to test PSparseMatrix
, but functions as expected for PVector
.
Let me know what you decide with indices etc.
@zjwegert I've kinda started implemented what I would imagine the IO should look like. The idea is that the functions psave
, pload
and pload!
are totally generic, and one can implement specific versions of to_local_storage
and from_local_storage
for any distributed structure we might want to save.
Two main benefits of this approach:
This is also general enough that it could potentially be moved to GridapDistributed and even PartitionedArrays.
The interface is based on the FileIO.jl one, with a p
in front (which is similar to PartitionedArrays's naming scheme). I prefer not to overwrite any specific package's API, since we are going to change some of the signatures. Also, this would allow some dispatching on the file extension (like FileIO.jl does), so that the user can choose the file format he wants. For instance, I believe other formats like BSON/JSON can easily extended from what we have. We might even be able to leverage the machinery currently available on FileIO.jl. I'll have to think about it.
Awesome @JordiManyer, I think this would be a great addition to GridapDistributed and PartitionedArrays.
I've added a serial test as well. I think we should be able to do this as well. Not sure how it fits in with the current implementation.
I think in serial we just rely on the regular functions... It's a tiny change, I don't think it matters too much. Also in serial you don't have ranks...
I think in serial we just rely on the regular functions... It's a tiny change, I don't think it matters too much. Also in serial you don't have ranks...
Agreed and also agree that serial doesn't need ranks of course. My point was more about people having to switch between psave
/pload
/pload!
and functions save
/load
/load!
. I don't think it's a big issue.
Also, I haven't read up on FileIO.jl but save
and load
seem to require some extra care.
I believe they work with Dict{String,Any}
, which is a bit annoying... I wish they worked for structs, but I guess JDL2 has it's own function for that.
Yeah... We could just use save_object
and load_object
for a serial implementation, and implement a load_object!
. Then we could rename psave
, pload
, and pload!
to save_pobject
, load_pobject
, and load_pobject!
.
If we could also remove FileIO from our dependencies and just create our own save
and load
functions, that default to JDL2. Then if someone wants to use other libraries they just explicitly call XXX.save
FileIO actually isn't in our dependencies JDL2 actually exports save
and load
from FileIO... Anyway, that is fine, I think we can just do as you say and call XXX.save etc.
Mmm ok... Anyway, I would say both options are fine.
@JordiManyer, see that last commit. I'll add some documentation and then merge if you're happy.
Does the LSTO_Distributed.XXX
compile? I don't think it's needed (it's implied) and I'm not sure it compiles (because it doesn't know what that is, since it's itself). I could be wrong...
Ok it does work, but it is indeed not necessary. I did some cleanup. We now only import save_object
and load_object
from JLD2. I don´t think it matters, but it is a little cleaner since it's the only things we use (I think?)
All good from my end.
All looks good to me!
I'll review, add some docs, then merge.
A small PR to add PVector IO. @JordiManyer, let me know if you're happy with this or want to adjust.