machinetranslate / machinetranslate.org

Open information and community for machine translation
https://machinetranslate.org
Creative Commons Attribution Share Alike 4.0 International
71 stars 56 forks source link

PBMT : SMT...? #157

Open bittlingmayer opened 2 years ago

bittlingmayer commented 2 years ago

These are often used synonymously, because PBMT was the last major wave of SMT.

Does it deserve an article? Or a section in SMT? Or to be treated as a synonym?

bittlingmayer commented 2 years ago

@zouharvi @randyscansani

zouharvi commented 2 years ago

Technically PBMT and NMT are both SMT, as opposed to RBMT. When people talk about SMT, they usually (citation needed) mean PBMT. That said, I had a class SMT which was equally about PBMT and NMT.

zouharvi commented 2 years ago

I think it's still interesting (especially because it gave us word-alignment etc) but I don't know if it still has relevance in the industry? Maybe something with phrase-level translation memories?

cefoo commented 2 years ago

The Google Translate API for Trados 2019 allows you to choose the system you want to use: PBMT or NMT. imagen

As @zouharvi mentions, this is something that can be found in the translation industry still to this day. For our translators' audience, I think it may be worth to further describe PBMT. Maybe not in an article of its own (perhaps redirecting can be possible?), but inside the SMT article.

randyscansani commented 2 years ago

I agree that PBMT can still be relevant. It's probably not used extensively in the industry, but so are some other topics in the roadmap (like unsupervised MT). In general, anyone who wants to know about MT and has already read articles/manuals on the topic has probably stumbled upon the PBMT acronym, so it might be worth adding a few words about it in my opinion.

bittlingmayer commented 2 years ago

Google Cloud just uses "PBMT" to mean the last generation of SMT, e.g. https://stackoverflow.com/a/63787121/4486860.

zouharvi commented 2 years ago

Seeing this discussion, maybe this is a call for us to have a disambiguating paragraph on RBMT, SMT, PBMT & NMT.

zouharvi commented 2 years ago

@bittlingmayer @cefoo @randyscansani Was this issue resolved?

cefoo commented 2 years ago

@zouharvi We have a very short list describing the different approaches to SMT in the SMT article. Perhaps it's a good basis for expanding on the topic.

bittlingmayer commented 2 years ago

My concern is that people searching for PBMT will not find our resources, even though we cover it.

I think we can help a bit by modifying the Approaches text, from

By the 2010s, the top systems, like Google Translate, used statistical machine translation. By the 2020s, the top systems used neural machine translation.

to

By the 2010s, the top systems, like Google Translate, used statistical machine translation (SMT), specifically phrase-based machine translation (PBMT).

By the 2020s, the top systems used neural machine translation (NMT).

bittlingmayer commented 2 years ago

If we really want to be correct about NMT being a type of SMT, we can do something like this which doesn't contradict that:

By the 2010s, the top systems, like Google Translate, used statistical machine translation (SMT).

By the 2020s, the top systems moved from phrase-based machine translation (PBMT) to neural machine translation (NMT).