How to deal with copied words in source sentences

I am sorry this issue is not directly related to the project.

In MT, some words/phrases are not translated, but copied from source sentences, such as person names, company names, etc. It occurs to me that there could be two approaches:

Use shared vocabularies for both source and target languages; however, one one hand, the Vocab size could be very large; and one the other hand, MT may be unaware what words/phrases that needn't be translated unless it sees in the training set.
Use pre-process, for example, to detect the words/phrases as named entities, rare words, etc, and replace them with special tokens. I have tried Spacy NER, which is not accurate enough in practice.

I tried Google translate and other translate apps, and to some extend, I found their systems can determine the copied words/phrases, though not perfectly. Could someone advise, in general, what is the best solution to this problem? Thanks.

marian-nmt / marian

How to deal with copied words in source sentences #267