gui11aume / starcode

All pairs search and sequence clustering
GNU General Public License v3.0
90 stars 21 forks source link

Input file memory not freed. #1

Closed ezorita closed 8 years ago

ezorita commented 10 years ago

In order to speed up the input file preprocessing I've decided not to free the memory allocated for barcode duplicates.

Each input barcode is malloc'ed in an independent string while reading the input file. In the previous versions, after sorting, the memory allocated to barcode duplicates was individually freed. However, the continuous call to free() seemed to take up to 25% of the preprocessing time.

-- Issues: We're now withholding an amount of useless memory equal to the size of the input file minus the aggregate size of the unique barcodes.

-- Notes: Other considerations like making an independent process to preprocess the input file have been discarded because the unique barcodes would need to be strcpy'ed in a shared memory segment in order to return them to the parent process. Probably strcpying would take longer than just freeing each duplicate individually in the conventional way. A possible solution is to make a dedicated thread that frees the memory concurrently with the main execution thread and exits once all duplicates are freed.