jmbreda / Sanity

Filtering of Poison noise on a single-cell RNA-seq UMI count matrix
GNU General Public License v3.0
69 stars 11 forks source link

better read/write interface? #22

Open stanleyjs opened 9 months ago

stanleyjs commented 9 months ago

Hello,

We are running sanity in a pipeline that's primarily implemented in python. Our datasets can be quite large. Our performance is really being crippled by Sanity's I/O interface. As I understand it, sanity expects a matrix market format file and outputs a csv.

Our data is already stored in memory in a python parent process, and we launch sanity with a subprocess. Is there a way to more quickly send and receive data from sanity? Right now we are stuck waiting on Sanity to write an enormous csv file, and then we have to read that enormous CSV file back into memory in the parent process.

The most obvious solution to me is to write some Python-to-C interface for Sanity using CDLL / ctypes. I wonder if you guys have any plans for this, or any tips to speed up interfacing with Sanity without hitting the disk so much?