prihoda / AbNumber

Convenience Python APIs for antibody numbering using ANARCI
MIT License
80 stars 11 forks source link

Running ANARCI (hmmer) in parallel for multiple sequences #13

Closed y1zhou closed 3 months ago

y1zhou commented 1 year ago

When we want to number multiple sequences, ANARCI takes an ncpu argument that allows running hmmer in parallel. Here it seems the Chain class can only take one sequence and send it to ANARCI. Would it be possible to have a wrapper class (e.g. Chains) that runs ANARCI in parallel?

Great work with this package! Interpreting the outputs of these is so much easier than those of ANARCI's.

prihoda commented 3 months ago

Hi @y1zhou sorry for the late response, there has been an update - I released abnumber 0.3.3 which supports Chain.batch(seq_dict, ...) to process multiple input seqs at once.

By default, abnumber uses ncpu=None because it calls anarci() not run_anarci(): https://github.com/oxpig/ANARCI/blob/master/lib/python/anarci/anarci.py#L766-L767

So Chain.batch should now use multiprocessing as long as hmmer supports it - I haven't tested this extensively, might depend on your installation