psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
57 stars 34 forks source link

partis partition crash #299

Closed wfs-lilly closed 4 years ago

wfs-lilly commented 4 years ago

Log file attached. This is running on a machine with 6G RAM allocated to docker, so it's entirely possible that the subprocess ran out of memory. Note that sequences in the output have had all but initial letter replaced with N. partis_failed_partition_aliva.seq_obs.log

psathyrella commented 4 years ago

Yes, looks like you're running out of memory. In the log file you can see it dumping the progress file for one of the four procs when it crashes (see below), and the memory column tells us that this proc is using 10% of your machine's 6GB, and multiplying by 4 for the other procs, plus the memory used by the python process that's driving these sub procs that's probably all your memory. Honestly, 6GB is just not going to be enough for this sort of thing. In the big picture, the various ways that general clustering algorithms avoid all-against-all comparison do not work in BCR rearrangement space (details in the clustering paper), in other words default partis is vastly more accurate than methods that use more heuristic clustering algorithms, but there's a price to pay. Clustering 130k sequences with default partis on, say, my desktop with 32GB and 8 cores is typically not a problem.

If there's really no other machine you can run on, you can try subsampling or vsearch clustering.

edit: or any of the other ways of reducing memory or compute time listed in the manual.

p