nolanlab / vortex

VorteX Clustering Environment - Java graphical tool for single-cell analysis, clustering and visualization. Latest release:
https://github.com/nolanlab/vortex/releases
34 stars 6 forks source link

Large cohort #15

Closed sararselitsky closed 6 years ago

sararselitsky commented 6 years ago

I tried running X-shift by command line on 112 samples (>100K cells each) using 50 threads and 100G of RAM. It ran for 16 hours and then had an out of memory error. I can keep tweaking the submission parameters for the cluster computer, but I was wondering what the maximum number of cells this has been successfully run on. Besides sub-sampling, is there a parameter I should use to decrease the computation? My cohort will soon increase in 350 samples and I need a method capable of handling 40 million cells.

Thanks!

Sara

nsamusik commented 6 years ago

Hi Sara,

Sorry to hear that. What operating system are you on? Can you please open your terminal and type “java -version” and tell me what response you are seeing?

Nikolay On Mon, Jun 4, 2018 at 6:46 AM Sara Selitsky notifications@github.com wrote:

I tried running X-shift by command line on 112 samples (>100K cells each) using 50 threads and 100G of RAM. It ran for 16 hours and then had an out of memory error. I can keep tweaking the submission parameters for the cluster computer, but I was wondering what the maximum number of cells this has been successfully run on. Besides sub-sampling, is there a parameter I should use to decrease the computation? My cohort will soon increase in 350 samples and I need a method capable of handling 40 million cells.

Thanks!

Sara

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nolanlab/vortex/issues/15, or mute the thread https://github.com/notifications/unsubscribe-auth/ADacLwnSuneCILKbMVDnCViy98yJMtvbks5t5To4gaJpZM4UZHlL .

-- Nikolay


This message is intended for the named addressee(s) only. It may contain

privileged and confidential information and protected by a copyright. Any disclosure, copying or distribution of this message is prohibited and may be unlawful. If you are not the intended recipient, please destroy this message and notify me immediately. Thank you for your cooperation.

sararselitsky commented 6 years ago

Sure! See below: CentOS Linux release 7.3.1611 (Core) openjdk version "1.8.0_102" OpenJDK Runtime Environment (build 1.8.0_102-b14) OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)

nsamusik commented 6 years ago

And what happens if you try running it with java -Xmx64G ? Still running out of memory? How many FCS files do you have? How many cells do you sample from each FCS file? Do you have a record of the stack trace of the OutOfMemoryError? It could be jn the Vortex.log

It may help me understand at what stage is the error is happening On Mon, Jun 4, 2018 at 11:44 AM Sara Selitsky notifications@github.com wrote:

Sure! See below: CentOS Linux release 7.3.1611 (Core) openjdk version "1.8.0_102" OpenJDK Runtime Environment (build 1.8.0_102-b14) OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/nolanlab/vortex/issues/15#issuecomment-394457850, or mute the thread https://github.com/notifications/unsubscribe-auth/ADacLwGSUxkBRskydM56JwwUs87xlwzwks5t5YAbgaJpZM4UZHlL .

-- Nikolay


This message is intended for the named addressee(s) only. It may contain

privileged and confidential information and protected by a copyright. Any disclosure, copying or distribution of this message is prohibited and may be unlawful. If you are not the intended recipient, please destroy this message and notify me immediately. Thank you for your cooperation.

sararselitsky commented 6 years ago

There are 112 FCS files, each with around 100K cells. The original error I got was from the job submission program, not the program or java, so I am rerunning it. It has currently been running for 19 hours (80 threads, maximum 200G of RAM). Since my cohort will soon more than double in size, I was wondering if you have experience with cohorts of a comparable size and what I can do to improve the speed, besides sub-sampling.

sararselitsky commented 6 years ago

I wanted to let you know that X-shift has been running 112 samples with around 100K cells per FCS file, for 3 days and 19 hours. It is running on 80 threads, 200G of RAM. Have you tested a cohort of a comparable size? If so, did you see these types of times? Thanks!

nsamusik commented 6 years ago

Yes, that makes sense that it's taking that long. I think it's too much for your system to handle. I suggest that you change the "limit of rows per file" in the data import config and set it to a maximum of 10000, you will end up then with 1.12M cells, which should get clustered in about a day. It will compute the clustering based on that 'core' set of cells and then impute the cluster assignments for the rest of the cells using nearest neighbour classification while writing the output FCS files.

On Fri, Jun 8, 2018 at 9:18 AM Sara Selitsky notifications@github.com wrote:

I wanted to let you know that X-shift has been running 112 samples with around 100K cells per FCS file, for 3 days and 19 hours. It is running on 80 threads, 200G of RAM. Have you tested a cohort of a comparable size? If so, did you see these types of times? Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nolanlab/vortex/issues/15#issuecomment-395812238, or mute the thread https://github.com/notifications/unsubscribe-auth/ADacL-ovG7J27GRTslHd1Kjg0t83oGQKks5t6qPagaJpZM4UZHlL .

-- Nikolay


This message is intended for the named addressee(s) only. It may contain

privileged and confidential information and protected by a copyright. Any disclosure, copying or distribution of this message is prohibited and may be unlawful. If you are not the intended recipient, please destroy this message and notify me immediately. Thank you for your cooperation.

sararselitsky commented 6 years ago

Ok, thanks!