hteumeuleu / email-bugs

Email quirks and bugs
539 stars 20 forks source link

Gmail clips emails at 102 KB or with special characters #41

Open hteumeuleu opened 6 years ago

hteumeuleu commented 6 years ago

Yesterday, an interesting conversation was started on the #emailgeeks Slack by @cossssmin regarding Gmail clipping limit.

Do we have a source for the magic 102KB message size Gmail clipping limit? I'm curious if someone actually tested this or if it was communicated from an official source, as I've been blindly following it (like most of us probably have), and I've recently seen emails that were 99.something KB getting clipped (before you ask, it was after ESP added stuff to it).

A few people, including myself, weighed in to share their experience and a few test results, but without a definitive answer yet. I loved the little collaborative investigation that went on for a few hours, but it seems Slack is not really appropriate for this (given the temporary nature of conversations there). So I thought here would be a good place to continue this together.

The problem

"Message clipped" screenshot in Gmail

When HTML emails are too large, Gmail clips them with a [Message clipped] notice and a View entire message link. It's been widely shared that this clipping occurs after 102Kb (example: Gmail is clippin my email on Mailchimp). But no one seems to know where this number comes from. And different people have experience different results around the 100Kb mark.

So what is going on exactly? Can we figure out Gmail's clipping algorithm?

hteumeuleu commented 6 years ago

Here are a few tests I ran yesterday.

First tests

My first question was how any of this was calculated. Does Gmail consider ~100Kb after doing all its prefixing and filtering (removing styles and HTML tags it doesn't support, converting class names and such)? So first I ran this test with 400 tables and 400 style tags.

Test email clipped at the 181th table

In this test, the email is clipped at the 181th email. If we try to reproduce this locally by keeping the 400 style tags but only keeping 181 tables, we obtain an HTML that weighs exactly 100 Kb (or 99 507 bytes / 102 Kb on drive according to macOS info dialog). Here's the result file of this first test.

To confirm this, I ran a second test without the style tags this time, only the 400 tables. The result shows the email is clipped at the 254th table.

Test email clipped at the 254th table

By reproducing this locally (and only keeping 254 tables), I can measure the weigh of the file to be 100 Kb (or 99 906 bytes / 102 Kb on drive according to macOS info dialog). Here's the result file of this second test.

Finally, I ran a third test with 400 tables, each with an HTML data attribute on each <td> (<td data-a-very-long-attribute-that-gmail-will-filter="true">). This should confirm whether Gmail measures the weight before or after any filtering.

Test email clipped at the 223rd table

The result shows the email is clipped at the 223rd table. But interestingly, the dummy text is clipped right after the first word. Here's the code of this table as seen in Gmail in Chrome by inspecting the code.

<table border="0" cellpadding="0" cellspacing="0" width="100%">
    <tbody>
        <tr>
            <td class="m_320685387637992867style223">
                <h1>223</h1>
                <p>
                    Lorem </p>
            </td>
        </tr>
    </tbody>
</table>

If we reproduce that exact same email locally (with only 223 tables and the text clipped at the first word at the end), we obtain a file that is once again exactly 100 Kb (or 100 217 bytes / 102 Kb on drive according to macOS info dialog). Here's the result file of third test.

First observations

Here are the first observations that I draw from these three tests:

revelt commented 6 years ago

Thank you for sharing!

It's definitely not exactly 100KB, I've seen crop happen at lesser sizes. My "rule of thumb" has always been to aim at file sizes less than 80KB. 1 character = 1 byte so that's around 80,000 characters, what's easy to check in the code editor if you select-all and see the status bar for total character count.

What's also interesting, if you consider, Gmail will receive not your HTML but what ESP sent it, basically, what you see in the "message's raw source". Various factors will bloat the served HTML code in there: ESP link URL scrambling, quoted printable encoding, sometimes ESP's serve message as BASE64-encoded...

So, maybe 100 could be the threshold, but definitely uppermost and very possible in the shape of 100x1024 characters in the raw source in the HTML part of the message, as received by email server. But I'd aim for less than 80,000 characters in source HTML.

cossssmin commented 6 years ago

Here's the third test file in Windows:

image

Differences are to be expected, and I think neither are true to what Gmail actually counts server side, as @revelt pointed out.

On this note, I'd encourage taking tools that test your HTML file size against 'Gmail's limit' with a grain of salt. The point is 'we don't know exactly yet', so apparently useful tools might be a little misleading. It can happen they're just a tiny bit off, but that can be the difference between your tracking pixel being removed in all Gmail clients, or not.

Here's the thrid test file that was clipped, in the tool I linked to above:

image

revelt commented 6 years ago

Very very cheeky

hteumeuleu commented 6 years ago

@cossssmin You tested the file after clipping. So isn't the tool actually accurate to show that it will pass Gmail clipping? If I test the original file, it indeed says the email is too big (at 175.869140625 Kb).

@revelt Good point on being careful with manipulations done on the ESP side. I used Putsmail for all my tests, which is pretty safe as far as I know.

cossssmin commented 6 years ago

Indeed Rémi, I've tested the same third result file. You mentioned macOS reported to be 102KB on disk.

So an HTML that you'd think would very likely be clipped (according to macOS' report) was actually reported as 'safe'. Even if we don't consider 'on disk', and we take it to be 100KB, the difference is still large enough to be misleading: there's a 2,1318359375KB difference at the very least, so logically I can imagine a ~103.13KB file as also being reported 'safe'.

My point was we should take such file size weighing tools as estimates, and definitely not fully trust some JavaScript that calculates text file size as being the same thing Gmail does when deciding to clip :)

pbiolsi commented 6 years ago

From your testing/understanding, what do we know about how/if images impact the calculable email weight by Gmail?

Since we're all serving images through some external server or CDN, I've always assumed that their size only had an implication on bandwidth/loadtime... but perhaps that is incorrect and the Gmail image cache is at all to blame for clipping?

Our embedded styles run pretty lean (so not triggering the CSS characters limit) and we use built-in minification provided by our ESP (Listrak) at the time of send to remove whitespace. Yet, we still see emails as small as 32kb getting clipped by Gmail.

Any possibility they impose a pixel height-based limitation that could be triggering that? Like Outlook has been known to?

cossssmin commented 6 years ago

Never saw emails that small being clipped in Gmail, and I sent quite a few that were larger than 32KB. You sure your ESP isn't adding stuff to what Gmail receives?

revelt commented 6 years ago

@pbiolsi check the raw source. I'm pretty sure Gmail is measuring raw source. Now, if message is multipart, there's Base64 with some heavy link scrambling, sizes will bloat significantly..

1 character in your email's raw source = 1 byte. That includes escape characters (used in quoted-printable encoding for example). After you rule-out the raw source, then move to images.

Sizes as low as 32KB should not get clipped, something's wrong here

pbiolsi commented 6 years ago

@cossssmin @revelt great points about the ESP bloat, however I've actually seen this in Litmus previews using the Chrome extension (so serving local source pre-ESP... also un-minified in this case). Maybe something else is causing this from the Litmus side, but I believe those previews use PutsMail so should be pretty true to source.

I'll investigate more and post back.

hteumeuleu commented 6 years ago

Last week on the #emailgeeks Slack, @M-J-Robbins shared an interesting example that gets clipped because of a special character.

Just got an email through that Gmail said was clipped but the code is tiny and no obvious errors.

This is the character in question `` (not sure if slack will auto convert that) but I believe it’s this one https://unicode-table.com/en/0092/

Here’s an html example https://litmus.com/scope/ajulzityfxh6

<html>
  <p>’you’re</p>

A screenshot on Gmail showing an email getting clip because of a special character

This only happens on the new Gmail redesign, not on the old one. This is very interesting because I also noticed a lot of emails getting clipped for no apparent reason since the redesign. I'll try to see if there are more characters triggering this.

ericlepetit commented 6 years ago

Funny enough the Github email notification for this message was clipped on Gmail :) I will bring this up to the Gmail team.

revelt commented 5 years ago

hi all, just crossposting here from email geeks slack for posterity because this is surfacing up once in a while. If you want to check, does your email contain any non-ascii characters (like Unicode's culprit "Private Use Two" above), I created a CLI app (terminal app) for that: https://www.npmjs.com/package/email-all-chars-within-ascii-cli

hthompson82 commented 5 years ago

Not only have I had issues with emails smaller than 102kb clipping, I've also sent several emails WELL over the 102kb (180-220kb) which don't clip. Anyone have any ideas? Getting a brick wall from Google whenever I try to talk to anyone about it.

revelt commented 5 years ago

@hthompson82 hi! Do you remember, what was the encoding of your raw messages that arrived into Gmail server (for example, quoted printable, Base64 etc.); did you measure character count in the raw HTML, decoded or the HTML that was put into ESP (former is more interesting); also was the message multipart and if so, how big was text part (hypothesis being text version's content might have affected Gmail clipping limit)?

hthompson82 commented 5 years ago

@revelt The encoding is "text/html; charset=utf-8". I count the kb size according to how it comes in to my Outlook (since I can't see size in Gmail) so that I can be sure it includes encoding, dynamic data etc. Our test sends are generally multipart but the text version varies between one word ("test") and a couple of paragraphs. Testing without the multipart text version still results in clipping, on the ones which clip.

jkupczak commented 5 years ago

@hthompson82 When you say Outlook, do you mean the Outlook web client or do you mean the desktop application that you have to install?

If you mean the desktop application, does Outlook let you see the original message? I don't have Outlook installed at the moment so I can't check for myself. But I thought that Outlook would only show you the message source which is the version that Outlook parsed using the Word engine.

cossssmin commented 5 years ago

@jkupczak the desktop Outlook app can show you the original HTML source, as it was received. You need to double click the message to open in a new window, then click "Message" in the top left, and then:

View source in Outlook

hthompson82 commented 5 years ago

@hthompson82 When you say Outlook, do you mean the Outlook web client or do you mean the desktop application that you have to install?

If you mean the desktop application, does Outlook let you see the original message? I don't have Outlook installed at the moment so I can't check for myself. But I thought that Outlook would only show you the message source which is the version that Outlook parsed using the Word engine.

Hi @jkupczak , i use the Outlook desktop app, which includes the size as a column in the inbox. (I also test emails via mail.yahoo.com and hotmail.com in the browser. Plus the various mobile apps on iOS and Android)

As an aside, I've also tested compressing the HTML (stripping out all white space) but to no avail.

hteumeuleu commented 5 years ago

Last week on the #emailgeeks Slack, @M-J-Robbins shared an interesting example that gets clipped because of a special character.

Just got an email through that Gmail said was clipped but the code is tiny and no obvious errors. This is the character in question `` (not sure if slack will auto convert that) but I believe it’s this one https://unicode-table.com/en/0092/ Here’s an html example https://litmus.com/scope/ajulzityfxh6

<html>
  <p>�you�re</p>

A screenshot on Gmail showing an email getting clip because of a special character

This only happens on the new Gmail redesign, not on the old one. This is very interesting because I also noticed a lot of emails getting clipped for no apparent reason since the redesign. I'll try to see if there are more characters triggering this.

This bug mentioned in this thread seems fixed. Can anyone else confirm?

revelt commented 5 years ago

Not fixed. Same thing — both original <html><p>’you’re</p> and also same thing wrapped with normal HTML head/body in a table. This is still happening.

screen shot 2019-02-12 at 00 40 26

hteumeuleu commented 5 years ago

Got an even shorter example triggering the clipped message being shown because of a special character:

<html>
  <p>©</p>

Encoding the copyright character into &copy; fixes the problem here.

jclusso commented 5 years ago

@hteumeuleu I have this issue and I'm using &copy; No idea why though.

jkupczak commented 5 years ago

@jclusso

Got any information you can share.

revelt commented 5 years ago

yeah, let's create a minimal case to be able to reproduce.. as they say "He who asserts must prove"

jclusso commented 5 years ago

I think the issue is that the provider I'm using (customer.io) to send the emails is setting the Content-Type header to text/html; charset=iso-8859-1. This still confuses me why &copy; won't work since it should as far as I'm aware. Most other emails I get seem to be text/html; charset=utf-8 which makes me believe that would solve this.

hthompson82 commented 5 years ago

Just sent two test emails from Adobe Campaign, one with &#x00A9; and one with ©. The unencoded version clipped, and the encoded one did not. Looks to me like the theory is sound. I use hexadecimal instead of a named entity, though.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width" />
<title>Untitled Document</title>
</head>

<body>
    &#x00A9;
</body>
</html>
revelt commented 5 years ago

@hthompson82 I checked the Adobe Creative Cloud newsletter which is allegedly sent from their platform. The encoding in the raw source is charset="windows-1252", encoded in quoted printable, with a corresponding charset=Windows-1252 meta tag in HTML. It seems all nice until one tries to decode the following raw piece:

See the tips<span class=3D"we=
b"> =9B</span></a></td>=20

It is not Windows-1252 but Windows-1251/Windows-1257 — a chevron. Test yourselves, https://dencode.com/en/string/quoted-printable

So, to sum up, it seems that Adobe Campaign platform is using a wrong charset to encode their quoted printable, and as a consequence, all non-ascii characters are mangled. All html-encoded characters don't get encoded because they're within ascii so they're fine.

cossssmin commented 4 years ago

Here's a weird one:

https://m3.news.ubisoft.com/nl/jsp/m.jsp?c=%405nrrN%2BHHqfIXH1q%2FHaFADEvE%2Bv1jGn1Gp%2FMdstPVNaI%3D

19KB email, shown in full, but still showing the 'clipped' message. I checked the source in Gmail web, and even the tracking pixel is there, so nothing was actually clipped...

This is what it looks like in Gmail web:

image

hteumeuleu commented 4 years ago

@cossssmin There's a © character in the footer, so I guess that's why.

NivenRanchhod commented 4 years ago

Just re-coded a template for a new client due to clipping. Reduced the raw HTML file by 100kb to now sit at 51kb. Tested via Putsmail and it's still clipping.

Read through this thread and simply replaced the © character with &copy; and voila, fixed.

avigoldman commented 4 years ago

On a related note - AMP for Email recently added a fixed 100kb size limit for emails. Probably is a good indication of what Gmail wants actually wants the 102kb limit to be. https://github.com/ampproject/amphtml/pull/29698#issuecomment-669546909

hteumeuleu commented 4 years ago

I just spent half an hour wondering why I got the "Message clipped" on an email. Turns out, I had an HTML comment in french with an accented character (like <!-- Mentions légales -->). I removed the accent and Gmail's message disappeared. Here’s a simple test code to try:

Hello world ! <!-- é -->
vladh commented 3 years ago

I encountered this problem when sending email to Gmail from Outlook. Sending a message with any German characters, such as "Grüsse", would cause the message to clip in Gmail.

I fixed this problem by making sure my email was encoded as UTF-8, which can be set in the Outlook settings.

Achar-Bhagyashree commented 3 years ago

A hyphen used in the pre-header text was triggering view entire message in Gmail. While I changed this hyphen to an HTML entity issue got resolved! Eg: "Take action – please provide your details" When I updated this to : "Take action '&ndash' please provide your details" issue is resolved.

cseeger commented 2 years ago

Can confirm © was causing this in our emails. Switching to &copy; fixed it.