Open maali-mnasri opened 8 years ago
Thanks for catching this; what you have done is what was originally intended. The alignments should still be the same, because of the two continues on lines 1293 and 1297. I will update the source soon.
Great! Thank you.
Hi, I'm also running in performance issues. Could you please provide your adjusted code? Many thanks.
@eoehri
Hi, I just added in aligner.py file these two lines
sourceWordIndicesBeingConsidered=list(set(sourceWordIndicesBeingConsidered)) targetWordIndicesBeingConsidered=list(set(targetWordIndicesBeingConsidered))
between line 1282 and line 1285 (just before the loop) . I hope this helps.
In aligner.py lines 1267 and 1268, each source/target word may be appended many times to the sourceWordsBeingConsidered/targetWordsBeingConsidered lists, which make these lists too big due to redundant elements. I do not see the point of including words indices many times as this makes the next loop (line 1285) very time consuming. To accelerate the execution, I converted sourceWordsBeingConsidered and targetWordsBeingConsidered lists to sets to remove duplicates. It is far faster now and I get the same alignment in testalign.py, however, I want to be sure that this does not deteriorate the alignment quality in other cases. Can you please confirm that removing redudancy is safe?