lipinggm / tlb

Automatically exported from code.google.com/p/tlb
0 stars 0 forks source link

Fix file IO across caching files for local data backup and Entry Repo Factory #93

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The mechanism for caching smoothing data locally appends lines to a tmp file. 
This is bad because it opens and closes the file multiple times and writes in 
very small chunks. We should use a buffered writer and write all the content in 
one shot.

Original issue reported on code.google.com by singh.janmejay on 7 Nov 2011 at 5:58

GoogleCodeExporter commented 9 years ago
This is probably not the culprit, but allocation in this call caused memory run 
out while working with buildr. The run out may have happened because it is 
allocating a lot of strings(char arr and other encoding related objects) for 
each row. Doing a one time conversion may help with garbage generation as well, 
in addition to helping IO.

Original comment by singh.janmejay on 7 Nov 2011 at 6:01

GoogleCodeExporter commented 9 years ago

Original comment by itspa...@gmail.com on 20 Nov 2011 at 7:25

GoogleCodeExporter commented 9 years ago

Original comment by singh.janmejay on 21 Dec 2011 at 1:05

GoogleCodeExporter commented 9 years ago

Original comment by singh.janmejay on 21 Dec 2011 at 1:05

GoogleCodeExporter commented 9 years ago
Will also try to handle 'make one big string' then 'write the big string to 
file in one shot' kinda scenarios. This happens in ERF sync repo to disk call, 
as of now.

Will also handle the load side of it. The entire string getting loaded in one 
shot and then parsing data-objects out of it. There should be no reason to load 
all the data in a file in one big string.

Original comment by singh.janmejay on 21 Dec 2011 at 1:14

GoogleCodeExporter commented 9 years ago

Original comment by singh.janmejay on 21 Dec 2011 at 1:15

GoogleCodeExporter commented 9 years ago
merged in 132a4bbe2f18a7391664cfa7a88e389fe7d1077d

ERF File IO made big-string free, used reader and writer.

Caching repo on balancer side uses a 4 byte header to keep number of lines 
available, so reading the whole file and making strings out of it is not 
necessary anymore, file is only read when reporting to the server. Writes are 
made using random-access-file seek. Reads and writes now have a lock around 
absolute file path.

Original comment by singh.janmejay on 24 Dec 2011 at 2:20