statgen / demuxlet

Genetic multiplexing of barcoded single cell RNA-seq
Apache License 2.0
117 stars 25 forks source link

Fix memory issues #59

Open VincentGardeux opened 4 years ago

VincentGardeux commented 4 years ago

This fixes memory issues when there are too many snps or samples to compute.

The error comes from the gpAB array which can be enormous in size if too many snps/samples (billions of values)

Here in the fix, I don't create these arrays at all, and instead compute the values on the go (since they are used only once anyways.

In our tests, it generates exactly the same results on small datasets, but can be run on millions of snps/hundreds of samples.

We did not see any major change in the computing time either.

Cheers

cramirezs commented 4 years ago

Dear @VincentGardeux,

Thank you for such needed modification. I am trying to install this branch but I keep running into an error about libz.h not being found. Do you know how I can disable this/is it essential? The error goes: configure:5356: error: libz.{so,a} was not found. Please install zlib at http://www.zlib.net/ first I already installed it, but I am not able to find how I can indicate the path where it is when I run configure. Do you have any suggestions or is it appropriate to get the bin file instead (and where)?

Thank you, Ciro

cramirezs commented 4 years ago

Hi Vincent,

I managed to solve it. I just needed to set the following flags: export CPPFLAGS='-I/mnt/BioHome/ciro/bin/zlib-1.2.11/include' export LDFLAGS='-L/mnt/BioHome/ciro/bin/zlib-1.2.11/lib' However, I ran into another issue:

/mnt/BioHome/ciro/bin/miniconda3/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.2.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld: ../htslib/libhts.a(cram_codecs.o): relocation R_X86_64_32S against symbol cram_byte_array_stop_encode_free' can not be used when making a shared object; recompile with -fPIC
/mnt/BioHome/ciro/bin/miniconda3/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.2.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:451: demuxlet] Error 1
make[1]: Leaving directory '/home/ciro/bin/demuxlet_vg'
make: *** [Makefile:346: all] Error 2

Could you please let me know if this is a potential problem you've identified and how to solve it?

Thank you, Ciro

VincentGardeux commented 4 years ago

Hey @cramirezs

I think any install issue you may encounter would be the same than when installing the base demuxlet? I did not change much code so it should be exactly the same procedure.

In your install, it says "recompile with -fPIC" maybe that would be the solution? Maybe check the base issue tracker, it should be there as well. (maybe https://github.com/statgen/demuxlet/issues/40 ?)

Cheers