puppeteer / puppeteer

JavaScript API for Chrome and Firefox
https://pptr.dev
Apache License 2.0
88.02k stars 9.05k forks source link

Puppeteer does not honor hyphens:auto #6840

Closed Friction-less-development closed 2 years ago

Friction-less-development commented 3 years ago

In Chromium 88, support for hyphens: auto was added.

https://www.chromestatus.com/feature/5672891947417600

However when using a puppeteer version that supports chromium 88 or above, hyphens:auto does not behave as expected.

I have downloaded the current version of chrome 88 locally, as well as the beta and canary versions. When viewing a test html file utilizing hyphens: auto in these chrome versions, the behavior works as expected and words are automatically hyphenated when the screen becomes too thin.

However if I boot these SAME executables up using puppeteer, and open the EXACT SAME test file, the hyphen behavior does NOT work as expected, and no hyphens appear.

My example hyphens test file is below.

<html lang="en">
    <div style="word-break: break-word; hyphens: auto;">
    reallylongwordwithnobreaksatall reallylongwordwithnobreaksatall reallylongwordwithnobreaksatall reallylongwordwithnobreaksatall reallylongwordwithnobreaksatall
    </div>
</html>

Steps to reproduce

Tell us about your environment:

What steps will reproduce the problem?

Please include code that reproduces the issue.

  1. Boot up a non-headless version of puppeteer
  2. View the above html file and notice that it does not match what the file looks like when using the corresponding chrome version.

What is the expected result?

Puppeteer should match chrome behavior for hyphens: auto

What happens instead?

Puppeteer does not match chrome behavior for hyphens: auto

OmarHawk commented 3 years ago

Seems like a normal Chrome downloads dictionaries after startup and installs it (under Windows) within the AppData User directory in order to let the hyphenation work:

grafik

as also explained in Chrome's tracking bug entry on how it is implemented: https://bugs.chromium.org/p/chromium/issues/detail?id=652964#c70

I guess, this is what is not happening when a fresh chrome is started via puppeteer...

Friction-less-development commented 3 years ago

That's a good guess. However as noted as in the OP, when I have instructed puppeteer to use a chrome executable that I have verified honors hyphens: auto when booting it up normally, it still doesn't work.

What I'm saying is say I have chrome 88 downloaded. When I open chrome 88 normally, hyphens: auto works.

But when I have puppeteer open that SAME chrome 88 executable, hyphens: auto does not work.

This leads me to believe there's something else going on.

redneb commented 3 years ago

That's a good guess. However as noted as in the OP, when I have instructed puppeteer to use a chrome executable that I have verified honors hyphens: auto when booting it up normally, it still doesn't work.

@Friction-less-development This is not about the chrome executable, it's about the user profile dir. As @OmarHawk pointed out, chrome downloads the hyphenation data in the user profile dir after launching for the first time on that user profile dir. So the issue you are experiencing has to do with puppeteer using a new user profile dir every time it launches chrome. If you disable that behavior by passing a persistent value to the userDataDir option of puppeteer.launch then it should work, provided that you give it some time to download the the hyphenation data (you might also have to restart chrome after that). I have confirmed that under linux (and according to the relevant chrome bug, it should behave the same under windows).

redneb commented 3 years ago

BTW, there seems to be another issue: in my tests, hyphens: auto is not honored when running in headless mode, but I would guess this is a chromium bug not a puppeteer issue.

Prinzhorn commented 2 years ago

Does anyone know where Chromium pulls the hyphenation info from on Linux/Ubuntu? There is no hyphen-data directory and I cannot find any .hyb in the user data dir. If I knew where to put the files I'd just copy them in the right place before calling launch. I already do that with fonts (copy them into /tmp/.fonts in AWS lambda so they work for the PDF generation).

MurakamiShinyu commented 2 years ago

BTW, there seems to be another issue: in my tests, hyphens: auto is not honored when running in headless mode, but I would guess this is a chromium bug not a puppeteer issue.

I searched Chromium bugs for this issue, but not found, so I reported:

MurakamiShinyu commented 2 years ago

I got answer from Chromium bugs: https://bugs.chromium.org/p/chromium/issues/detail?id=1261480#c5

Traditional headless mode is a completely different browser implementation that is launched if chrome is started with --headless switch. As such, it supports a very limited subset of components and I don't think component updater is one of them. We don't have plans to bring headless chrome feature support any closer to the one of the headful chrome.

Instead, we're working on a new "native" headless mode that uses the regular chrome browser running without a physical graphics device. It obviously supports everything a regular chrome browser supports.

Native headless mode is still in experimental phase, however, it is available in the stable chrome (only on Linux at this point) if headless mode is activated using the --headless switch specified with 'chrome' value (--headless=chrome) or if USE_HEADLESS_CHROME environment variable is set.

The option --headless=chrome (or USE_HEADLESS_CHROME environment variable) on Linux would be worth testing.

tassin-gauthier commented 2 years ago

If someone need this feature for his product, he can try a look at Hyphenopoly

stale[bot] commented 2 years ago

We're marking this issue as unconfirmed because it has not had recent activity and we weren't able to confirm it yet. It will be closed if no further activity occurs within the next 30 days.

stale[bot] commented 2 years ago

We are closing this issue. If the issue still persists in the latest version of Puppeteer, please reopen the issue and update the description. We will try our best to accomodate it!

schaakverslaafd commented 11 months ago

It seems like puppeteer does not render hyphens correctly yet. I tried rendering with hypens:auto with the new headless mode. Starting chrome normally the hyphens are rendered, but using puppeteer the hyphens are not shown. Also with the executablePath pointing to chrome and the userDataDir pointing to the 'chrome user data dir'.

nathanparrett commented 7 months ago

Does anyone know where Chromium pulls the hyphenation info from on Linux/Ubuntu? There is no hyphen-data directory and I cannot find any .hyb in the user data dir. If I knew where to put the files I'd just copy them in the right place before calling launch. I already do that with fonts (copy them into /tmp/.fonts in AWS lambda so they work for the PDF generation).

Did you ever get a solution to this problem?

nathanparrett commented 7 months ago

It seems like puppeteer does not render hyphens correctly yet. I tried rendering with hypens:auto with the new headless mode. Starting chrome normally the hyphens are rendered, but using puppeteer the hyphens are not shown. Also with the executablePath pointing to chrome and the userDataDir pointing to the 'chrome user data dir'.

Did you find a solution to this?