Closed Rickasaurus closed 8 years ago
Hi Rick,
I had a look at this issue a while back but gave up on it soon because of the difficulty of the undertaking. At the moment caching is too ingrained in the library logic to remove without substantial refactoring. I'll come back at this as soon as I find me the time to do it.
In the meantime, here are some alternative courses of action:
let fsp = FsPickler.CreateBinary()
let chunkSize = 100000
let serialize (inputs : seq<'T>) (target : Stream) =
let count = ref 0
for window in Seq.windowed chunkSize inputs do
incr count
fsp.Serialize(targer, window, leaveOpen = true)
!count
let deserialize count (source : Stream) = seq {
for i in 1 .. count do
yield! fsp.Deserialize<'T[]>(stream, leaveOpen = true)
}
ObjectIdGenerator
that avoids the issues or contribute a fix to ObjectIdGenerator
proper.Thoughts?
Btw, have you tried using the .SerializeSequence
methods? I would be interested to see how they behave in your case.
SerializeSequence gives exactly the same error.
I added a few adjustments as of yesterday (nuget version >= 1.0.9). Is this behaviour occuring there?
On Thursday, February 5, 2015, Rick Minerich notifications@github.com wrote:
SerializeSequence gives exactly the same error.
— Reply to this email directly or view it on GitHub https://github.com/nessos/FsPickler/issues/38#issuecomment-73116797.
Sent from Gmail Mobile
w.r.t. SerializeSequence
On Thursday, February 5, 2015, Rick Minerich notifications@github.com wrote:
SerializeSequence gives exactly the same error.
— Reply to this email directly or view it on GitHub https://github.com/nessos/FsPickler/issues/38#issuecomment-73116797.
Sent from Gmail Mobile
I'm trying the chunked version now, but it seems to have stopped writing (or gotten extremely slow) after it wrote about 900MB (the same place where the non-chunked give exceptions).
It's going to take some time before I can move a new build into our locked down environment to test. Maybe a tool to just autogen some random data would be worth it here.
I'm convinced that the chunked version is somehow just spinning its wheels now. It's been using a full core for 10 minutes and still hasn't written anything more out. The other possibility is that it's doing a lot of catching exceptions inside somewhere.
Good point, there should probably be a test of this, writing large sequences to '/dev/null' or something.
By the way, what is the element type of your data set? String? F# ADT? class? struct?
On Thursday, February 5, 2015, Rick Minerich notifications@github.com wrote:
I'm trying the chunked version now, but it seems to have stopped writing (or gotten extremely slow) after it wrote about 900MB (the same place where the non-chunked give exceptions).
It's going to take some time before I can move a new build into our locked down environment to test. Maybe a tool to just autogen some random data would be worth it here.
— Reply to this email directly or view it on GitHub https://github.com/nessos/FsPickler/issues/38#issuecomment-73118240.
Sent from Gmail Mobile
It's a 2-3 level deep record tree with some structs, char arrays and strings in it.
You could probably reproduce this with just a one member class with random strings though. It seems to be all about the number of objects.
Oh, and I didn't mention before and it probably doesn't matter because the issue is in the generic code, but I've been trying to use the binary serializer.
It looks like the chunked method is working (it just spit out another 900MB), but it seems like it's taking increasingly more time per record as it goes on.
Ok, so here's what I tried out:
type Tree<'T> = Leaf | Branch of 'T * Tree<'T> * Tree<'T>
let rec mkTree (f : int -> 'T) n =
if n = 0 then Leaf
else Branch(f n, (mkTree f (n - 1)) , (mkTree f (n - 1)))
let large N = seq { for i in 1 .. N -> mkTree (fun i -> "textfield" + string i) 3 }
open System.IO
let fsp = FsPickler.CreateBinary()
// test 1 : quickly ate up all my memory
let eagerSeqPickler = Pickler.seq Pickler.auto<Tree<string>>
fsp.Serialize(eagerSeqPickler, Stream.Null, large 30000000)
// test 2 : time scales proportionally to input size, memory usage remains constant.
// Real: 00:08:22.967, CPU: 00:08:55.718, GC gen0: 5620, gen1: 2384, gen2: 792
fsp.SerializeSequence(Stream.Null, large 30000000)
The tests were run on my machine (Windows 8 VM running on a core i5 laptop with 4GB).
I think the problem here is clearly with ObjectIdGenerator.Rehash()
, which becomes ridiculously expensive as the number of objects increases. Have a look at its implementation:
http://referencesource.microsoft.com/#mscorlib/system/runtime/serialization/objectidgenerator.cs,145
This would explain I think both the devouring of memory and the intermittent stalls in IO.
Ahh, it all makes sense now. Certainly it would be ideal to have something improved, but I understand that can be a lot of work. I'd be pretty happy with just a way to bypass it.
SerializeSequence
should be a safe bet. I actually just pushed a package update (1.0.11) that fine tunes performance with respect to ObjectIdGenerator after the benchmarks I just ran.
Awesome! I'll submit it for scanning and hopefully I'll have it in our environment in a week or so.
Richard Minerich Microsoft MVP (F#) @Rickasaurus richard.minerich@gmail.com phone: 860-922-3456
On Thu, Feb 5, 2015 at 7:44 PM, Eirik Tsarpalis notifications@github.com wrote:
SerializeSequence should be a safe bet. I actually just pushed a package update (1.0.11) that fine tunes performance with respect to ObjectIdGenerator after the benchmarks I just ran.
— Reply to this email directly or view it on GitHub https://github.com/nessos/FsPickler/issues/38#issuecomment-73160971.
I was trying to use FsPickler to pull a largish (~27 million records) to disk and found that about 30 minutes in it failed with:
Nessos.FsPicker.FsPicklerException: Error serializing instance of type System.String[] ---> System.Runtime.Serialization.SerializationException: The internal array cannot expand to greater than Int32.MaxValue elements.
This occurs in a call to ObjectIDGenerator.Rehash() from ObjectIDGenerator.GetID(Object obj, Boolean& firstTime) which is called in FsPickler in CompositePickler`1.Write(WriteState state, String tag, T value) in CompositePickler.fs on line 189
It turns out this is a common problem in .NET, due to the internal use of ObjectIDGenerator to look for cycles. However, as this is a straight pull of a SQL database I can guarantee there's no cycles in this case.
On a side note, the behavior of ObjectIDGenerator is poor in the failure case. It does not release the memory from its table thus leaving a big chunk of ram in use.