Closed pb-cdunn closed 8 years ago
This is what Jason had posted in google-chat:
Change 10000 to 20000 in the following code inside aligne.c
1766 if (hgh-low >= 7500)
1767 wsize = VectorEl*(hgh-low+1);
1768 else
1769 wsize = VectorEl*20000;
I think some "single molecule weirdness" casue some big gaps.. wsize (I guess is the memory used for the band need to be bigger)
however, this fixe will use more memory
Christopher,
I think we may have fixed this one a few days ago (one of my
postdocs reported a bug where the alignment routine had a realloc check that was off by one). The character of the error matches what your are reporting. Moreover, I found a pretty bad bug involving masks that I introduced in the Oct. 31 commit and I've fixed a few other small things that were not quite right.
So could you please take the very most recent version (commited 10
minutes ago) and merge that into your working version? Also, I don't know who works with the code at DNAnexus but may want to propogate the news to them.
Also, did you guys notice the new DBdump and LAdump routines that
should make it easy for you to get info out of my framework (including things like trace points)?
Cheers,
Gene
On 12/3/15, 5:25 PM, Christopher Dunn wrote:
Unfortunately, the old test-case got deleted. (I should have copied it.) But I can post what I learned about it:
|This crashes repeatably, in only ~3 minutes: daligner -v -k18 -w8 -h480 -t16 -H15000 -e0.7 -l4800 -M32 raw_reads.202 raw_reads.112
I'll try to get a stack-trace ...
align.c:159 155 void Free_Work_Data(Work_Data ework) 156 { _Work_Data work = (_Work_Data *) ework; 157 if (work->vector != NULL) 158 free(work->vector); 159 if (work->cells != NULL) 160 free(work->cells); 161 if (work->trace != NULL) 162 free(work->trace); 163 if (work->points != NULL) 164 free(work->points); 165 free(work); 166 }
filter.c:1834 1500 void Match_Filter(char aname, HITS_DB ablock, char bname, HITS_DB bblock, 1501 int self, int comp, Align_Spec *aspec) ... 1833 for (i = 0; i < NTHREADS; i++) 1834 Free_Work_Data(parmr[i].work);
daligner.c:695 519 int main(int argc, char *argv[]) ... 695 Match_Filter(aroot,&cblock,broot,&bblock,0,1,asettings); 696 Close_DB(&bblock); 697 free(broot);
Memory is too large for valgrind (24GB). I've added some memory diagnostics, but nothing looks bad. The problem occurs near the end of Match_Filter, between New_Work_Data() and Free_Work_Data(). When I have time, I will try to link with google-tcmalloc, which has a nice memory-checker. | |Hi, Chris:
For some sequences comparison, the function Local_Alignment() in align.c does not reserve enough memory. (It is hard to predict the right amount of memory for “arbitrary sequences”.) If you chance ling 1769,
wsize = VectorEl*10000;
towsize = VectorEl*20000;
. The task will finish without segfault. (see where wsize is used for allocating memory… etc.) I hope this helps.
Cheers, Jason Yes, that "fix" works for this example, but it doesn't explain the crash. Memory is being freed twice somewhere. It's probably caused by another buffer over-run somehow. We need to know where, and we need to detect the problem at runtime to avoid these annoying, uninformative crashes. That's all I have right now.
— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/32.
Great! We will rebase onto your latest code this week. Since I lost the test-case, I won't be able to verify the fix, but no news is good news.
Thanks for telling us about DBdump and LAdump. We'll look into that. Might be helpful.
Maybe you would be interested in some of the changes in our fork. I can think of a couple at least:
A) I added -u#
to DBsplit, to set the unit for the -s#
option. The reason is that I like to set a tiny block-size in order to force a very large number of blocks, to test distributed behavior. Also, I have a tiny, super-fast test-case with error-free reads from a synthesized template, which serves as an excellent regression-test (since the result is 100% verifiable) and I like to force that tiny input to be split into 2 blocks to increase the coverage of the integration-test.
B) Someone submitted a change to fasta2DB which accepts long lines. That's really helpful because I like to eyeball Sequel data by viewing the start of each line via less without line-wrapping. Also, with error-free in silico sampling, it's helpful to select a long string of letters and search for it elsewhere, to analyze our algorithms. Unfortunately, this submitted change was mixed with code to accept compressed fasta (which we'll drop someday because that could just as easily be handled via standard UNIX pipes), so you won't want to merge this code upstream. But maybe you could come up with a better way. The main idea is that it's ok to rely on the fasta header to indicate when the buffer needs to be increased.
Chris,
Could you please try out the following version of DBsplit.c and
fasta2DB.c The changes are as follows:
fasta2DB: Should now work with arbitrarily long sequence lines. The header lines still need to be of length less than MAX_NAME (set to 10,000 in DB.h)
DBsplit: Rather than add another option, I simply made the -s option a real number parameter. So anyone that uses it with an int, e.g. -s200, will get the behavior that they expect, but if some sneaky person :-), wanted say extra small blocks, then they could say -s.01 and that would give them a block size of 1,000,000 * .01 = 10,000. OK with you?
Let me know if the codes are all good in your hands, and then I will commit if I hear back that all is good.
-- Gene
On 12/8/15, 5:30 PM, Christopher Dunn wrote:
Great! We will rebase onto your latest code this week. Since I lost the test-case, I won't be able to verify the fix, but no news is good news.
Thanks for telling us about DBdump and LAdump. We'll look into that. Might be helpful.
Maybe you would be interested in some of the changes in our fork. I can think of a couple at least:
A) I added |-u#| to DBsplit, to set the unit for the |-s#| option. The reason is that I like to set a tiny block-size in order to force a very large number of blocks, to test distributed behavior. Also, I have a tiny, super-fast test-case with error-free reads from a synthesized template, which serves as an excellent regression-test (since the result is 100% verifiable) and I like to force that tiny input to be split into 2 blocks to increase the coverage of the integration-test.
B) Someone submitted a change to fasta2DB which accepts long lines. That's really helpful because I like to eyeball Sequel data by viewing the start of each line via less without line-wrapping. Also, with error-free /in silico/ sampling, it's helpful to select a long string of letters and search for it elsewhere, to analyze our algorithms. Unfortunately, this submitted change was mixed with code to accept compressed fasta (which we'll drop someday because that could just as easily be handled via standard UNIX pipes), so you won't want to merge this code upstream. But maybe you could come up with a better way. The main idea is that it's ok to rely on the fasta header to indicate when the buffer needs to be increased.
— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/32#issuecomment-162936834.
/****\
/*** *
static char *Usage = "[-a] [-x
int main(int argc, char argv[]) { HITS_DB db, dbs; int64 dbpos; FILE dbfile, *ixfile; int status;
int ALL; int CUTOFF; int64 SIZE;
{ int i, j, k; int flags[128]; char *eptr; float size;
ARG_INIT("DBsplit")
CUTOFF = 0;
size = 200;
j = 1;
for (i = 1; i < argc; i++)
if (argv[i][0] == '-')
switch (argv[i][1])
{ default:
ARG_FLAGS("a")
break;
case 'x':
ARG_NON_NEGATIVE(CUTOFF,"Min read length cutoff")
break;
case 's':
ARG_REAL(size)
if (size <= 0.)
{ fprintf(stderr,"%s: Block size must be a positive number\n",Prog_Name);
exit (1);
}
break;
}
else
argv[j++] = argv[i];
argc = j;
SIZE = size*1000000ll;
ALL = flags['a'];
if (argc != 2)
{ fprintf(stderr,"Usage: %s %s\n",Prog_Name,Usage);
exit (1);
}
}
// Open db
status = Open_DB(argv[1],&db); if (status < 0) exit (1); if (db.part > 0) { fprintf(stderr,"%s: Cannot be called on a block: %s\n",Prog_Name,argv[1]); exit (1); }
{ char _pwd, *root; char buffer[2_MAX_NAME+100]; int nfiles; int i;
pwd = PathTo(argv[1]);
if (status)
{ root = Root(argv[1],".dam");
dbfile = Fopen(Catenate(pwd,"/",root,".dam"),"r+");
}
else
{ root = Root(argv[1],".db");
dbfile = Fopen(Catenate(pwd,"/",root,".db"),"r+");
}
ixfile = Fopen(Catenate(pwd,PATHSEP,root,".idx"),"r+");
if (dbfile == NULL || ixfile == NULL)
exit (1);
free(pwd);
free(root);
if (fscanf(dbfile,DB_NFILE,&nfiles) != 1)
SYSTEM_ERROR
for (i = 0; i < nfiles; i++)
if (fgets(buffer,2*MAX_NAME+100,dbfile) == NULL)
SYSTEM_ERROR
if (fread(&dbs,sizeof(HITS_DB),1,ixfile) != 1)
SYSTEM_ERROR
if (dbs.cutoff >= 0)
{ printf("You are about to overwrite the current partition settings. This\n");
printf("will invalidate any tracks, overlaps, and other derivative files.\n");
printf("Are you sure you want to proceed? [Y/N] ");
fflush(stdout);
if (fgets(buffer,100,stdin) == NULL)
SYSTEM_ERROR
if (index(buffer,'n') != NULL || index(buffer,'N') != NULL)
{ printf("Aborted\n");
fflush(stdout);
fclose(dbfile);
exit (1);
}
}
dbpos = ftello(dbfile);
fseeko(dbfile,dbpos,SEEK_SET);
fprintf(dbfile,DB_NBLOCK,0);
fprintf(dbfile,DB_PARAMS,SIZE,CUTOFF,ALL);
}
{ HITS_READ *reads = db.reads; int nreads = db.ureads; int64 totlen; int nblock, ireads, treads, rlen, fno; int i;
nblock = 0;
totlen = 0;
ireads = 0;
treads = 0;
fprintf(dbfile,DB_BDATA,0,0);
if (ALL)
for (i = 0; i < nreads; i++)
{ rlen = reads[i].rlen;
if (rlen >= CUTOFF)
{ ireads += 1;
treads += 1;
totlen += rlen;
if (totlen >= SIZE)
{ fprintf(dbfile,DB_BDATA,i+1,treads);
totlen = 0;
ireads = 0;
nblock += 1;
}
}
}
else
for (i = 0; i < nreads; i++)
{ rlen = reads[i].rlen;
if (rlen >= CUTOFF && (reads[i].flags & DB_BEST) != 0)
{ ireads += 1;
treads += 1;
totlen += rlen;
if (totlen >= SIZE)
{ fprintf(dbfile,DB_BDATA,i+1,treads);
totlen = 0;
ireads = 0;
nblock += 1;
}
}
}
if (ireads > 0)
{ fprintf(dbfile,DB_BDATA,nreads,treads);
nblock += 1;
}
fno = fileno(dbfile);
if (ftruncate(fno,ftello(dbfile)) < 0)
SYSTEM_ERROR
fseeko(dbfile,dbpos,SEEK_SET);
fprintf(dbfile,DB_NBLOCK,nblock);
dbs.cutoff = CUTOFF;
dbs.all = ALL;
dbs.treads = treads;
rewind(ixfile);
fwrite(&dbs,sizeof(HITS_DB),1,ixfile);
}
fclose(ixfile); fclose(dbfile); Close_DB(&db);
exit (0); }
/****\
/*** *
static char *Usage = "[-v] [-p
static char number[128] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
typedef struct { int argc; char *argv; FILE input; int count; char *name; } File_Iterator;
File_Iterator _init_file_iterator(int argc, char _argv, FILE input, int first) { File_Iterator *it;
it = Malloc(sizeof(File_Iterator),"Allocating file iterator"); it->argc = argc; it->argv = argv; it->input = input; if (input == NULL) it->count = first; else { it->count = 1; rewind(input); } return (it); }
int next_file(File_Iterator *it) { static char nbuffer[MAX_NAME+8];
if (it->input == NULL) { if (it->count >= it->argc) return (0); it->name = it->argv[it->count++]; } else { char *eol;
if (fgets(nbuffer,MAX_NAME+8,it->input) == NULL)
{ if (feof(it->input))
return (0);
SYSTEM_ERROR;
}
if ((eol = index(nbuffer,'\n')) == NULL)
{ fprintf(stderr,"%s: Line %d in file list is longer than %d chars!\n",
Prog_Name,it->count,MAX_NAME+7);
it->name = NULL;
}
*eol = '\0';
it->count += 1;
it->name = nbuffer;
}
return (1); }
int main(int argc, char argv[]) { FILE istub, ostub; char dbname; char root, pwd;
FILE bases, indx; int64 boff, ioff;
int ifiles, ofiles; char **flist;
HITS_DB db; int ureads; int64 offset;
FILE IFILE; char PNAME; int VERBOSE;
// Usage: [-v] path:string ( -f
{ int i, j, k; int flags[128];
ARG_INIT("fasta2DB")
IFILE = NULL;
PNAME = NULL;
j = 1;
for (i = 1; i < argc; i++)
if (argv[i][0] == '-')
switch (argv[i][1])
{ default:
ARG_FLAGS("v")
break;
case 'f':
IFILE = fopen(argv[i]+2,"r");
if (IFILE == NULL)
{ fprintf(stderr,"%s: Cannot open file of inputs '%s'\n",Prog_Name,argv[i]+2);
exit (1);
}
break;
case 'p':
PNAME = argv[i]+2;
break;
}
else
argv[j++] = argv[i];
argc = j;
VERBOSE = flags['v'];
if ((IFILE == NULL && argc <= 2) || (IFILE != NULL && argc != 2))
{ fprintf(stderr,"Usage: %s %s\n",Prog_Name,Usage);
exit (1);
}
}
// Try to open DB file, if present then adding to DB, otherwise creating new DB. Set up
// variables as follows:
// dbname = full name of db =
{ int i;
root = Root(argv[1],".db");
pwd = PathTo(argv[1]);
dbname = Strdup(Catenate(pwd,"/",root,".db"),"Allocating db name");
if (dbname == NULL)
exit (1);
if (IFILE == NULL)
ifiles = argc-2;
else
{ File_Iterator *ng;
ifiles = 0;
ng = init_file_iterator(argc,argv,IFILE,2);
while (next_file(ng))
ifiles += 1;
free(ng);
}
istub = fopen(dbname,"r");
if (istub == NULL)
{ ofiles = 0;
bases = Fopen(Catenate(pwd,PATHSEP,root,".bps"),"w+");
indx = Fopen(Catenate(pwd,PATHSEP,root,".idx"),"w+");
if (bases == NULL || indx == NULL)
exit (1);
fwrite(&db,sizeof(HITS_DB),1,indx);
ureads = 0;
offset = 0;
boff = 0;
ioff = 0;
}
else
{ if (fscanf(istub,DB_NFILE,&ofiles) != 1)
SYSTEM_ERROR
bases = Fopen(Catenate(pwd,PATHSEP,root,".bps"),"r+");
indx = Fopen(Catenate(pwd,PATHSEP,root,".idx"),"r+");
if (bases == NULL || indx == NULL)
exit (1);
if (fread(&db,sizeof(HITS_DB),1,indx) != 1)
SYSTEM_ERROR
fseeko(bases,0,SEEK_END);
fseeko(indx, 0,SEEK_END);
ureads = db.ureads;
offset = ftello(bases);
boff = offset;
ioff = ftello(indx);
}
flist = (char **) Malloc(sizeof(char *)*(ofiles+ifiles),"Allocating file list");
ostub = Fopen(Catenate(pwd,"/",root,".dbx"),"w+");
if (ostub == NULL || flist == NULL)
exit (1);
fprintf(ostub,DB_NFILE,ofiles+ifiles);
for (i = 0; i < ofiles; i++)
{ int last;
char prolog[MAX_NAME], fname[MAX_NAME];
if (fscanf(istub,DB_FDATA,&last,fname,prolog) != 3)
SYSTEM_ERROR
if ((flist[i] = Strdup(fname,"Adding to file list")) == NULL)
goto error;
fprintf(ostub,DB_FDATA,last,fname,prolog);
}
}
{ int maxlen; int64 totlen, count[4]; int pmax, rmax; HITS_READ prec; char read; int c; File_Iterator *ng;
// Buffer for reads all in the same well
pmax = 100;
prec = (HITS_READ *) Malloc(sizeof(HITS_READ)*pmax,"Allocating record buffer");
if (prec == NULL)
goto error;
// Buffer for accumulating .fasta sequence over multiple lines
rmax = MAX_NAME + 60000;
read = (char *) Malloc(rmax+1,"Allocating line buffer");
if (read == NULL)
goto error;
totlen = 0; // total # of bases in new .fasta files
maxlen = 0; // longest read in new .fasta files
for (c = 0; c < 4; c++) // count of acgt in new .fasta files
count[c] = 0;
// For each new .fasta file do:
ng = init_file_iterator(argc,argv,IFILE,2);
while (next_file(ng))
{ FILE *input;
char *path, *core, *prolog;
int nline, eof, rlen, pcnt;
int pwell;
if (ng->name == NULL) goto error;
// Open it: <path>/<core>.fasta, check that core is not too long,
// and checking that it is not already in flist.
path = PathTo(ng->name);
core = Root(ng->name,".fasta");
if ((input = Fopen(Catenate(path,"/",core,".fasta"),"r")) == NULL)
goto error;
free(path);
if (strlen(core) >= MAX_NAME)
{ fprintf(stderr,"%s: File name over %d chars: '%.200s'\n",
Prog_Name,MAX_NAME,core);
goto error;
}
{ int j;
for (j = 0; j < ofiles; j++)
if (strcmp(core,flist[j]) == 0)
{ fprintf(stderr,"%s: File %s.fasta is already in database %s.db\n",
Prog_Name,core,Root(argv[1],".db"));
goto error;
}
}
// Get the header of the first line. If the file is empty skip.
pcnt = 0;
rlen = 0;
nline = 1;
eof = (fgets(read,MAX_NAME,input) == NULL);
if (eof || strlen(read) < 1)
{ fprintf(stderr,"Skipping '%s', file is empty!\n",core);
fclose(input);
free(core);
continue;
}
// Add the file name to flist
if (VERBOSE)
{ fprintf(stderr,"Adding '%s' ...\n",core);
fflush(stderr);
}
flist[ofiles++] = core;
// Check that the first line has PACBIO format, and record prolog in 'prolog'.
if (read[strlen(read)-1] != '\n')
{ fprintf(stderr,"File %s.fasta, Line 1: Fasta line is too long (> %d chars)\n",
core,MAX_NAME-2);
goto error;
}
if (!eof && read[0] != '>')
{ fprintf(stderr,"File %s.fasta, Line 1: First header in fasta file is missing\n",core);
goto error;
}
{ char *find;
int well, beg, end, qv;
find = index(read+1,'/');
if (find != NULL && sscanf(find+1,"%d/%d_%d RQ=0.%d\n",&well,&beg,&end,&qv) >= 3)
{ *find = '\0';
if (PNAME != NULL)
prolog = Strdup(PNAME,"Extracting prolog");
else
prolog = Strdup(read+1,"Extracting prolog");
*find = '/';
if (prolog == NULL) goto error;
}
else
{ fprintf(stderr,"File %s.fasta, Line %d: Pacbio header line format error\n",
core,nline);
goto error;
}
}
// Read in all the sequences until end-of-file
{ int i, x;
pwell = -1;
while (!eof)
{ int beg, end, clen;
int well, qv;
char *find;
find = index(read+(rlen+1),'/');
if (find == NULL)
{ fprintf(stderr,"File %s.fasta, Line %d: Pacbio header line format error\n",
core,nline);
goto error;
}
if (PNAME == NULL)
{ *find = '\0';
if (strcmp(read+(rlen+1),prolog) != 0)
{ fprintf(stderr,
"File %s.fasta, Line %d: Pacbio header line name inconsistent\n",
core,nline);
goto error;
}
*find = '/';
}
x = sscanf(find+1,"%d/%d_%d RQ=0.%d\n",&well,&beg,&end,&qv);
if (x < 3)
{ fprintf(stderr,"File %s.fasta, Line %d: Pacbio header line format error\n",
core,nline);
goto error;
}
else if (x == 3)
qv = 0;
rlen = 0;
while (1)
{ eof = (fgets(read+rlen,MAX_NAME,input) == NULL);
nline += 1;
x = strlen(read+rlen)-1;
if (read[rlen+x] != '\n')
{ if (read[rlen] == '>')
{ fprintf(stderr,"File %s.fasta, Line %d:",core,nline);
fprintf(stderr," Fasta header line is too long (> %d chars)\n",
MAX_NAME-2);
goto error;
}
else
x += 1;
}
if (eof || read[rlen] == '>')
break;
rlen += x;
if (rlen + MAX_NAME > rmax)
{ rmax = ((int) (1.2 * rmax)) + 1000 + MAX_NAME;
read = (char *) realloc(read,rmax+1);
if (read == NULL)
{ fprintf(stderr,"File %s.fasta, Line %d:",core,nline);
fprintf(stderr," Out of memory (Allocating line buffer)\n");
goto error;
}
}
}
read[rlen] = '\0';
for (i = 0; i < rlen; i++)
{ x = number[(int) read[i]];
count[x] += 1;
read[i] = (char) x;
}
ureads += 1;
totlen += rlen;
if (rlen > maxlen)
maxlen = rlen;
prec[pcnt].origin = well;
prec[pcnt].fpulse = beg;
prec[pcnt].rlen = rlen;
prec[pcnt].boff = offset;
prec[pcnt].coff = -1;
prec[pcnt].flags = qv;
Compress_Read(rlen,read);
clen = COMPRESSED_LEN(rlen);
fwrite(read,1,clen,bases);
offset += clen;
if (pwell == well)
{ prec[pcnt].flags |= DB_CSS;
pcnt += 1;
if (pcnt >= pmax)
{ pmax = ((int) (pcnt*1.2)) + 100;
prec = (HITS_READ *) realloc(prec,sizeof(HITS_READ)*pmax);
if (prec == NULL)
{ fprintf(stderr,"File %s.fasta, Line %d: Out of memory",core,nline);
fprintf(stderr," (Allocating read records)\n");
goto error;
}
}
}
else if (pcnt == 0)
pcnt += 1;
else
{ x = 0;
for (i = 1; i < pcnt; i++)
if (prec[i].rlen > prec[x].rlen)
x = i;
prec[x].flags |= DB_BEST;
fwrite(prec,sizeof(HITS_READ),pcnt,indx);
prec[0] = prec[pcnt];
pcnt = 1;
}
pwell = well;
}
// Complete processing of .fasta file: flush last well group, write file line
// in db image, free prolog, and close file
x = 0;
for (i = 1; i < pcnt; i++)
if (prec[i].rlen > prec[x].rlen)
x = i;
prec[x].flags |= DB_BEST;
fwrite(prec,sizeof(HITS_READ),pcnt,indx);
fprintf(ostub,DB_FDATA,ureads,core,prolog);
}
free(prolog);
fclose(input);
}
// Finished loading all sequences: update relevant fields in db record
db.ureads = ureads;
if (istub == NULL)
{ for (c = 0; c < 4; c++)
db.freq[c] = (float) ((1.*count[c])/totlen);
db.totlen = totlen;
db.maxlen = maxlen;
db.cutoff = -1;
}
else
{ for (c = 0; c < 4; c++)
db.freq[c] = (float) ((db.freq[c]*db.totlen + (1.*count[c]))/(db.totlen + totlen));
db.totlen += totlen;
if (maxlen > db.maxlen)
db.maxlen = maxlen;
}
}
// If db has been previously partitioned then calculate additional partition points and // write to new db file image
if (db.cutoff >= 0) { int64 totlen, dbpos, size; int nblock, ireads, tfirst, rlen; int ufirst, cutoff, allflag; HITS_READ record; int i;
if (VERBOSE)
{ fprintf(stderr,"Updating block partition ...\n");
fflush(stderr);
}
// Read the block portion of the existing db image getting the indices of the first
// read in the last block of the exisiting db as well as the partition parameters.
// Copy the old image block information to the new block information (except for
// the indices of the last partial block)
if (fscanf(istub,DB_NBLOCK,&nblock) != 1)
SYSTEM_ERROR
dbpos = ftello(ostub);
fprintf(ostub,DB_NBLOCK,0);
if (fscanf(istub,DB_PARAMS,&size,&cutoff,&allflag) != 3)
SYSTEM_ERROR
fprintf(ostub,DB_PARAMS,size,cutoff,allflag);
if (allflag)
allflag = 0;
else
allflag = DB_BEST;
nblock -= 1;
for (i = 0; i <= nblock; i++)
{ if (fscanf(istub,DB_BDATA,&ufirst,&tfirst) != 2)
SYSTEM_ERROR
fprintf(ostub,DB_BDATA,ufirst,tfirst);
}
// Seek the first record of the last block of the existing db in .idx, and then
// compute and record partition indices for the rest of the db from this point
// forward.
fseeko(indx,sizeof(HITS_DB)+sizeof(HITS_READ)*ufirst,SEEK_SET);
totlen = 0;
ireads = 0;
for (i = ufirst; i < ureads; i++)
{ if (fread(&record,sizeof(HITS_READ),1,indx) != 1)
SYSTEM_ERROR
rlen = record.rlen;
if (rlen >= cutoff && (record.flags & DB_BEST) >= allflag)
{ ireads += 1;
tfirst += 1;
totlen += rlen;
if (totlen >= size)
{ fprintf(ostub," %9d %9d\n",i+1,tfirst);
totlen = 0;
ireads = 0;
nblock += 1;
}
}
}
if (ireads > 0)
{ fprintf(ostub,DB_BDATA,ureads,tfirst);
nblock += 1;
}
db.treads = tfirst;
fseeko(ostub,dbpos,SEEK_SET);
fprintf(ostub,DB_NBLOCK,nblock); // Rewind and record the new number of blocks
}
else db.treads = ureads;
rewind(indx); fwrite(&db,sizeof(HITS_DB),1,indx); // Write the finalized db record into .idx
rewind(ostub); // Rewrite the number of files actually added fprintf(ostub,DB_NFILE,ofiles);
if (istub != NULL) fclose(istub); fclose(ostub); fclose(indx); fclose(bases);
rename(Catenate(pwd,"/",root,".dbx"),dbname); // New image replaces old image
exit (0);
// Error exit: Either truncate or remove the .idx and .bps files as appropriate.
// Remove the new image file
error: if (ioff != 0) { fseeko(indx,0,SEEK_SET); if (ftruncate(fileno(indx),ioff) < 0) SYSTEM_ERROR } if (boff != 0) { fseeko(bases,0,SEEK_SET); if (ftruncate(fileno(bases),boff) < 0) SYSTEM_ERROR } fclose(indx); fclose(bases); if (ioff == 0) unlink(Catenate(pwd,PATHSEP,root,".idx")); if (boff == 0) unlink(Catenate(pwd,PATHSEP,root,".bps"));
if (istub != NULL) fclose(istub); fclose(ostub); unlink(Catenate(pwd,"/",root,".dbx"));
exit (1); }
PS. The fasta2DB changes does not use the header to determine the length of the sequence line, it still dynamically keeps expanding to fit whatever comes. I want fasta2DB to work even when the fasta comes from somewhere other than a Pacbio machine. -- Gene
On 12/13/15, 1:29 PM, Gene Myers wrote:
Chris,
Could you please try out the following version of DBsplit.c and fasta2DB.c The changes are as follows:
fasta2DB: Should now work with arbitrarily long sequence lines. The header lines still need to be of length less than MAX_NAME (set to 10,000 in DB.h)
DBsplit: Rather than add another option, I simply made the -s option a real number parameter. So anyone that uses it with an int, e.g. -s200, will get the behavior that they expect, but if some sneaky person :-), wanted say extra small blocks, then they could say -s.01 and that would give them a block size of 1,000,000 * .01 = 10,000. OK with you?
Let me know if the codes are all good in your hands, and then I will commit if I hear back that all is good.
-- Gene
On 12/8/15, 5:30 PM, Christopher Dunn wrote:
Great! We will rebase onto your latest code this week. Since I lost the test-case, I won't be able to verify the fix, but no news is good news.
Thanks for telling us about DBdump and LAdump. We'll look into that. Might be helpful.
Maybe you would be interested in some of the changes in our fork. I can think of a couple at least:
A) I added |-u#| to DBsplit, to set the unit for the |-s#| option. The reason is that I like to set a tiny block-size in order to force a very large number of blocks, to test distributed behavior. Also, I have a tiny, super-fast test-case with error-free reads from a synthesized template, which serves as an excellent regression-test (since the result is 100% verifiable) and I like to force that tiny input to be split into 2 blocks to increase the coverage of the integration-test.
B) Someone submitted a change to fasta2DB which accepts long lines. That's really helpful because I like to eyeball Sequel data by viewing the start of each line via less without line-wrapping. Also, with error-free /in silico/ sampling, it's helpful to select a long string of letters and search for it elsewhere, to analyze our algorithms. Unfortunately, this submitted change was mixed with code to accept compressed fasta (which we'll drop someday because that could just as easily be handled via standard UNIX pipes), so you won't want to merge this code upstream. But maybe you could come up with a better way. The main idea is that it's ok to rely on the fasta header to indicate when the buffer needs to be increased.
— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/32#issuecomment-162936834.
Could you please try out the following version of DBsplit.c and fasta2DB.c The changes are as follows: fasta2DB: Should now work with arbitrarily long sequence lines.
Wunderbar!
The header lines still need to be of length less than MAX_NAME (set to 10,000 in DB.h)
Gut!
The fasta2DB changes does not use the header to determine the length of the sequence line, it still dynamically keeps expanding to fit whatever comes.
Sehr gut!
DBsplit: Rather than add another option, I simply made the -s option a real number parameter. So anyone that uses it with an int, e.g. -s200, will get the behavior that they expect, but if some sneaky person :-), wanted say extra small blocks, then they could say -s.01 and that would give them a block size of 1,000,000 * .01 = 10,000. OK with you?
Nifty. That's fine for my use-case.
Instead of posting code into comments, you can use a GitHub gist. And if you want to delete the code from your current comment, as owner of this repo you can Edit any comment.
Let me know if the codes work, I did a bit of testing, but I would feel more comfortable if you beat on these versions a bit and then got back to me before I commit them. Thanks, Gene
On 12/13/15, 3:52 PM, Christopher Dunn wrote:
Could you please try out the following version of DBsplit.c and fasta2DB.c The changes are as follows: fasta2DB: Should now work with arbitrarily long sequence lines.
Wunderbar!
The header lines still need to be of length less than MAX_NAME (set to 10,000 in DB.h)
Gut!
The fasta2DB changes does not use the header to determine the length of the sequence line, it still dynamically keeps expanding to fit whatever comes.
Sehr gut!
DBsplit: Rather than add another option, I simply made the -s option a *real* number parameter. So anyone that uses it with an int, e.g. -s200, will get the behavior that they expect, but if some sneaky person :-), wanted say extra small blocks, then they could say -s.01 and that would give them a block size of 1,000,000 * .01 = 10,000. OK with you?
Nifty. That's fine for my use-case.
— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/32#issuecomment-164266384.
Tested. Both work perfectly.
Btw, the convention on GitHub is to have a file called LICENSE
which applies to the whole repo, so you don't need the copyright in every file.
Thanks,
I'll commit both in a new version shortly. I appreciate the
LICENSE file idea, it sure will make things easier.
-- Gene
On 12/14/15, 4:15 PM, Christopher Dunn wrote:
Tested. Both work perfectly.
- https://gist.github.com/pb-cdunn/41da6632b0cc4ff8b059
- https://gist.github.com/pb-cdunn/a71046d8520c195e74e4
Btw, the convention on GitHub is to have a file called |LICENSE| which applies to the whole repo, so you don't need the copyright in every file.
— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/32#issuecomment-164463277.
Chris,
The mods have been commited.
-- Gene
On 12/14/15, 4:15 PM, Christopher Dunn wrote:
Tested. Both work perfectly.
- https://gist.github.com/pb-cdunn/41da6632b0cc4ff8b059
- https://gist.github.com/pb-cdunn/a71046d8520c195e74e4
Btw, the convention on GitHub is to have a file called |LICENSE| which applies to the whole repo, so you don't need the copyright in every file.
— Reply to this email directly or view it on GitHub https://github.com/thegenemyers/DALIGNER/issues/32#issuecomment-164463277.
I also ran valgrind. No significant errors. In case you're interested ...
valgrind --track-origins=yes --leak-check=full fasta2DB -v raw_reads -f0-rawreads/input.fofn
==9905== Syscall param write(buf) points to uninitialised byte(s)
==9905== at 0x4F223B0: __write_nocancel (syscall-template.S:81)
==9905== by 0x4EAFA82: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1261)
==9905== by 0x4EB0F5B: new_do_write (fileops.c:538)
==9905== by 0x4EB0F5B: _IO_do_write@@GLIBC_2.2.5 (fileops.c:511)
==9905== by 0x4EB1C4E: _IO_switch_to_get_mode (genops.c:184)
==9905== by 0x4EAF43D: _IO_file_seekoff@@GLIBC_2.2.5 (fileops.c:969)
==9905== by 0x4EA8B8C: rewind (rewind.c:36)
==9905== by 0x40312C: main (fasta2DB.c:639)
==9905== Address 0x403807c is not stack'd, malloc'd or (recently) free'd
==9905== Uninitialised value was created by a heap allocation
==9905== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==9905== by 0x40357D: Malloc (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DAZZ_DB/fasta2DB)
==9905== by 0x401CCF: main (fasta2DB.c:308)
==9905==
==9905==
==9905== HEAP SUMMARY:
==9905== in use at exit: 74,777 bytes in 11 blocks
==9905== total heap usage: 19 allocs, 8 frees, 77,744 bytes allocated
==9905==
==9905== LEAK SUMMARY:
==9905== definitely lost: 0 bytes in 0 blocks
==9905== indirectly lost: 0 bytes in 0 blocks
==9905== possibly lost: 0 bytes in 0 blocks
==9905== still reachable: 74,777 bytes in 11 blocks
==9905== suppressed: 0 bytes in 0 blocks
Possibly the .idx array was not initialized beyond its last used element.
valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all DBsplit -x5 -s.065536 -a raw_reads
==10075== HEAP SUMMARY:
==10075== in use at exit: 125 bytes in 2 blocks
==10075== total heap usage: 12 allocs, 10 frees, 4,514 bytes allocated
==10075==
==10075== 8 bytes in 1 blocks are still reachable in loss record 1 of 2
==10075== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10075== by 0x4EBF2B9: strdup (strdup.c:42)
==10075== by 0x402232: Strdup (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DAZZ_DB/DBsplit)
==10075== by 0x4012A2: main (DBsplit.c:89)
==10075==
==10075== 117 bytes in 1 blocks are still reachable in loss record 2 of 2
==10075== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10075== by 0x4C2CA8B: realloc (vg_replace_malloc.c:692)
==10075== by 0x4026DA: Catenate (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DAZZ_DB/DBsplit)
==10075== by 0x40353A: Open_DB (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DAZZ_DB/DBsplit)
==10075== by 0x401785: main (DBsplit.c:127)
==10075==
==10075== LEAK SUMMARY:
==10075== definitely lost: 0 bytes in 0 blocks
==10075== indirectly lost: 0 bytes in 0 blocks
==10075== possibly lost: 0 bytes in 0 blocks
==10075== still reachable: 125 bytes in 2 blocks
==10075== suppressed: 0 bytes in 0 blocks
valgrind --leak-check=full --show-leak-kinds=all HPCdaligner -v -dal4 -t16 -h1 -e.70 -l1 -s1000 -H1 raw_reads 1-2
==10170== HEAP SUMMARY:
==10170== in use at exit: 934 bytes in 5 blocks
==10170== total heap usage: 9 allocs, 4 frees, 2,082 bytes allocated
==10170==
==10170== 12 bytes in 1 blocks are still reachable in loss record 1 of 5
==10170== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10170== by 0x4EBF2B9: strdup (strdup.c:42)
==10170== by 0x41A092: Strdup (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/HPCdaligner)
==10170== by 0x4011C8: main (HPCdaligner.c:102)
==10170==
==10170== 80 bytes in 1 blocks are still reachable in loss record 2 of 5
==10170== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10170== by 0x419F8D: Malloc (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/HPCdaligner)
==10170== by 0x40129B: main (HPCdaligner.c:118)
==10170==
==10170== 119 bytes in 1 blocks are still reachable in loss record 3 of 5
==10170== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10170== by 0x4C2CA8B: realloc (vg_replace_malloc.c:692)
==10170== by 0x41A53A: Catenate (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/HPCdaligner)
==10170== by 0x4023E2: main (HPCdaligner.c:234)
==10170==
==10170== 155 bytes in 1 blocks are still reachable in loss record 4 of 5
==10170== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10170== by 0x4C2CA8B: realloc (vg_replace_malloc.c:692)
==10170== by 0x41A63A: Numbered_Suffix (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/HPCdaligner)
==10170== by 0x40278E: main (HPCdaligner.c:299)
==10170==
==10170== 568 bytes in 1 blocks are still reachable in loss record 5 of 5
==10170== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10170== by 0x4EA4ECC: __fopen_internal (iofopen.c:73)
==10170== by 0x41A14E: Fopen (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/HPCdaligner)
==10170== by 0x40242D: main (HPCdaligner.c:236)
==10170==
==10170== LEAK SUMMARY:
==10170== definitely lost: 0 bytes in 0 blocks
==10170== indirectly lost: 0 bytes in 0 blocks
==10170== possibly lost: 0 bytes in 0 blocks
==10170== still reachable: 934 bytes in 5 blocks
==10170== suppressed: 0 bytes in 0 blocks
valgrind --leak-check=full --show-leak-kinds=all daligner -v -h1 -t16 -H1 -e0.7 -l1 -s1000 raw_reads.1 raw_reads.1
==10550== HEAP SUMMARY:
==10550== in use at exit: 2,297,275 bytes in 12 blocks
==10550== total heap usage: 65 allocs, 53 frees, 14,537,012 bytes allocated
==10550==
==10550== Thread 1:
==10550== 9 bytes in 1 blocks are still reachable in loss record 1 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x53E32B9: strdup (strdup.c:42)
==10550== by 0x4225B2: Strdup (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x402ACC: main (daligner.c:577)
==10550==
==10550== 12 bytes in 1 blocks are still reachable in loss record 2 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x53E32B9: strdup (strdup.c:42)
==10550== by 0x422823: Root (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x403A98: main (daligner.c:686)
==10550==
==10550== 40 bytes in 1 blocks are still reachable in loss record 3 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x4224AD: Malloc (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x402BFE: main (daligner.c:598)
==10550==
==10550== 48 bytes in 1 blocks are still reachable in loss record 4 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x4224AD: Malloc (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x40BF36: New_Align_Spec (align.c:242)
==10550== by 0x403AD0: main (daligner.c:688)
==10550==
==10550== 80 bytes in 1 blocks are still reachable in loss record 5 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x4224AD: Malloc (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x402BDE: main (daligner.c:597)
==10550==
==10550== 117 bytes in 1 blocks are still reachable in loss record 6 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x4C2CA8B: realloc (vg_replace_malloc.c:692)
==10550== by 0x422A5A: Catenate (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x4238BA: Open_DB (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x4021DD: read_DB (daligner.c:381)
==10550== by 0x403A58: main (daligner.c:682)
==10550==
==10550== 156 bytes in 1 blocks are still reachable in loss record 7 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x4C2CA8B: realloc (vg_replace_malloc.c:692)
==10550== by 0x422B5A: Numbered_Suffix (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x40B537: Match_Filter (filter.c:2188)
==10550== by 0x403F2F: main (daligner.c:736)
==10550==
==10550== 1,400 bytes in 1 blocks are possibly lost in loss record 8 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x423D8E: Open_DB (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x4021DD: read_DB (daligner.c:381)
==10550== by 0x403A58: main (daligner.c:682)
==10550==
==10550== 66,037 bytes in 1 blocks are possibly lost in loss record 9 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x427459: Read_All_Sequences (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x4024BB: read_DB (daligner.c:439)
==10550== by 0x403A58: main (daligner.c:682)
==10550==
==10550== 131,072 bytes in 1 blocks are still reachable in loss record 10 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x4224AD: Malloc (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x40C0EE: New_Align_Spec (align.c:267)
==10550== by 0x403AD0: main (daligner.c:688)
==10550==
==10550== 1,049,152 bytes in 1 blocks are still reachable in loss record 11 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x4224AD: Malloc (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x406153: Sort_Kmers (filter.c:762)
==10550== by 0x403D38: main (daligner.c:718)
==10550==
==10550== 1,049,152 bytes in 1 blocks are still reachable in loss record 12 of 12
==10550== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10550== by 0x4224AD: Malloc (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/daligner)
==10550== by 0x406153: Sort_Kmers (filter.c:762)
==10550== by 0x403F86: main (daligner.c:741)
==10550==
==10550== LEAK SUMMARY:
==10550== definitely lost: 0 bytes in 0 blocks
==10550== indirectly lost: 0 bytes in 0 blocks
==10550== possibly lost: 67,437 bytes in 2 blocks
==10550== still reachable: 2,229,838 bytes in 10 blocks
==10550== suppressed: 0 bytes in 0 blocks
valgrind --leak-check=full --show-leak-kinds=all LAsort -v raw_reads.1.raw_reads.1.C0 raw_reads.1.raw_reads.1.N0 raw_reads.1.raw_reads.1.C1 raw_reads.1.raw_reads.1.N1 raw_reads.1.raw_reads.1.C2 raw_reads.1.raw_reads.1.N2 raw_reads.1.raw_reads.1.C3 raw_reads.1.raw_reads.1.N3
==10692== Warning: set address range perms: large range [0x3a048040, 0x1286fa844) (undefined)
==10692== Warning: set address range perms: large range [0x3a048028, 0x759f4a58) (noaccess)
==10692==
==10692== HEAP SUMMARY:
==10692== in use at exit: 146 bytes in 2 blocks
==10692== total heap usage: 50 allocs, 48 frees, 1,000,032,146 bytes allocated
==10692==
==10692== 7 bytes in 1 blocks are still reachable in loss record 1 of 2
==10692== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10692== by 0x4EBF2B9: strdup (strdup.c:42)
==10692== by 0x418842: Strdup (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/LAsort)
==10692== by 0x401425: main (LAsort.c:134)
==10692==
==10692== 139 bytes in 1 blocks are still reachable in loss record 2 of 2
==10692== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10692== by 0x4C2CA8B: realloc (vg_replace_malloc.c:692)
==10692== by 0x418CEA: Catenate (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/LAsort)
==10692== by 0x401759: main (LAsort.c:175)
==10692==
==10692== LEAK SUMMARY:
==10692== definitely lost: 0 bytes in 0 blocks
==10692== indirectly lost: 0 bytes in 0 blocks
==10692== possibly lost: 0 bytes in 0 blocks
==10692== still reachable: 146 bytes in 2 blocks
==10692== suppressed: 0 bytes in 0 blocks
valgrind --leak-check=full --show-leak-kinds=all LAmerge -v raw_reads.1 raw_reads.1.raw_reads.1.C0.S raw_reads.1.raw_reads.1.N0.S raw_reads.1.raw_reads.1.C1.S raw_reads.1.raw_reads.1.N1.S raw_reads.1.raw_reads.1.C2.S raw_reads.1.raw_reads.1.N2.S raw_reads.1.raw_reads.1.C3.S raw_reads.1.raw_reads.1.N3.S
==10724== Warning: set address range perms: large range [0x3a048040, 0x1286fa844) (undefined)
==10724== Warning: set address range perms: large range [0x3a048028, 0x1286fa85c) (noaccess)
==10724==
==10724== HEAP SUMMARY:
==10724== in use at exit: 149 bytes in 2 blocks
==10724== total heap usage: 33 allocs, 31 frees, 4,000,006,303 bytes allocated
==10724==
==10724== 8 bytes in 1 blocks are still reachable in loss record 1 of 2
==10724== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10724== by 0x4EBF2B9: strdup (strdup.c:42)
==10724== by 0x4192C2: Strdup (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/LAmerge)
==10724== by 0x4018CF: main (LAmerge.c:229)
==10724==
==10724== 141 bytes in 1 blocks are still reachable in loss record 2 of 2
==10724== at 0x4C2ABBD: malloc (vg_replace_malloc.c:296)
==10724== by 0x4C2CA8B: realloc (vg_replace_malloc.c:692)
==10724== by 0x41976A: Catenate (in /home/UNIXHOME/cdunn/repo/gh/FALCON-integrate/DALIGNER/LAmerge)
==10724== by 0x401CDC: main (LAmerge.c:277)
==10724==
==10724== LEAK SUMMARY:
==10724== definitely lost: 0 bytes in 0 blocks
==10724== indirectly lost: 0 bytes in 0 blocks
==10724== possibly lost: 0 bytes in 0 blocks
==10724== still reachable: 149 bytes in 2 blocks
==10724== suppressed: 0 bytes in 0 blocks
Unfortunately, the old test-case got deleted. (I should have copied it.) But I can post what I learned about it:
That's all I have right now.