neulab / awesome-align

A neural word aligner based on multilingual BERT
https://arxiv.org/abs/2101.08231
BSD 3-Clause "New" or "Revised" License
321 stars 46 forks source link

Fasten the extraction process #30

Closed shanyas10 closed 2 years ago

shanyas10 commented 2 years ago

Is there a way to speedup the extraction process for alignments? Like a parameter to parallelize things or something. Right now I'm extracting alignments for a dataset of 25k samples and it takes more than 1.5 hrs on CPU. In case there's a way to run this on GPU that would also be helpful. Below is the current set of parameters I'm using:

awesome-align \ --output_file=$align_dest/$align_fn \ --model_name_or_path=bert-base-multilingual-cased \ --data_file=$trans_fn \ --extraction 'softmax' \ --cache_dir ../cache/ \ --batch_size 32

zdou0830 commented 2 years ago

Hi, awesome-align will run on GPU if available by default (https://github.com/neulab/awesome-align/blob/master/awesome_align/run_align.py#L257). You can check if torch.cuda.is_available() will return True.

You may also set num_workers to a larger number (the default value is 4).

shanyas10 commented 2 years ago

Thanks, didn't know about the GPU support. It's way faster now. Closing the issue :)