janboddez / share-on-mastodon

Easily share WordPress posts on Mastodon.
https://jan.boddez.net/wordpress/share-on-mastodon
GNU General Public License v3.0
39 stars 5 forks source link

alt text not transferring #49

Closed andywarburton closed 11 months ago

andywarburton commented 1 year ago

Hi there,

Love the plugin, it's genuinely useful but I've noticed that alt-text on images does not get passed through to Mastodon. This results in posts getting flagged by accessibility bots.

For example this post: https://andywarburton.co.uk/2023/another-test/

Lost its alt text when transferring to Mastodon: https://mastodon.social/@andy_warb/109627526479441029

Mastodon folks are really passionate about accessibility so I think this is an important fix. Let me know if I can help in any way.

andywarburton commented 1 year ago

PS just a heads up, I tested this with the featured image and the images in the post, they all lose their alt text (also if possible, it could be good to check if alt text has been set and if not to use the image caption instead as I often set a caption but forget the alt!)

janboddez commented 1 year ago

Looks like this is related to #43, which was never really resolved.

Haven't had this happen to me, yet. But I do see the image's alt text, on that page, ends with a newline. Is that possible? I'm wondering if this somehow trips up Mastodon's API. Or maybe it's an encoding issue or something.

Are you familiar with PHP at all? WP debugging? We could try adding a couple error_log() statements and enable debug logging to see what is happening here.

chriscoyier commented 1 year ago

Just noting I'm seeing the same. The Mastodon instance I'm on flags images without alt text so it's extra noticeable.

Screenshot 2023-01-25 at 6 54 11 AM

(All those images did have alt text in the content.)

janboddez commented 1 year ago

(Not) cool.

I have since added some slight alt text sanitization (which should rule out newline characters) as well as a debugging statement or two. I'll push the changes to WP.org shortly. (Later today, hopefully.)

https://github.com/janboddez/share-on-mastodon/blob/2b0fe324526ce33c7ee7b18a9463fd7f2c821472/includes/class-image-handler.php#L130

This would at least enable us (if someone with this issue were willing to enable debug logging) to get a look at what might be going on. (Like, does it really not pick up the alt string, or is Mastodon's API being picky?)

janboddez commented 1 year ago

v0.10.0 is live now, so if someone was willing to try with WP_DEBUG_LOG on, that would help tremendously.

Looks like these failing alts occasionally contain newlines, quotes, or slashes, too. Makes me wonder if they should be encoded somehow. (In v0.10.0, sanitize_text_field() will get rid of the newline chars, so that's that, at least.)

I was under the impression that multipart/form-data requests would not suffer from this sort of thing, but I'm probably dead wrong here. Also, Ruby might treat these differently, still.

Possibly related: https://github.com/whatwg/html/issues/7575

janboddez commented 1 year ago

Did some quick experimenting (on my Windows system/Linux web server):

$alt = esc_attr( $alt ); // This leads to literal, i.e. double-escaped `"`s. Not very nice.
$alt = str_replace( array( "\r", "\n", '"' ), array( '%0D', '%0A', '%22' ), $alt ); // Here you'll get `%22` instead. Not good.

If I don't escape alt text, i.e., just leave <CR><LF> and " for what they are, I get the expected result. No, that isn't entirely true: the line breaks disappear. But they seem to disappear also when I revisit the "alt" field in WordPress itself. I'd have to check/alter the database to be 100% sure, I guess. Just did, pasted this straight into the DB (using phpMyAdmin, though, and again on Windows):

This is a " bit of text " with

some carriage returns (I hope)

and quotes in

it.

Again, comes through just fine on the condition that no additional escaping whatsoever takes place. I.e., the changes in v0.10.0 would have to be undone. AKA I just cannot reproduce this issue.

chriscoyier commented 1 year ago

I upgraded versions and no dice.

I don't envy the debugging work here, especially since it's the integration of two systems. Possible it's a bug in the Mastodon API. I think I'd error on escaping the crap out of everything in case it's that. Double escaped quotes be damned.

janboddez commented 1 year ago

Or it could be Jetpack, some weird database caching thing, or the fact that you're on a Mac (although I'd at least expect the simplest of strings to always work). That's why I'd need that debug.log; for all I know the plugin is unable to fetch any alt text at all—which would weird, because the bit of code I use to grab it from WP's postmeta table is kind of copied verbatim from WP core, but who knows?

janboddez commented 1 year ago

Also coming across suggestions that maybe the (Mastodon) server's max upload size is exceeded; that binary data goes through but POST fields do not. Not sure if these files are exceptionally large? Since some of them are screenshots, I very much doubt it. (I normally grab the "large" WP-generated image; on sites that use Jetpack's CDN, however, the original gets sent.) Mastodon normally accepts files up to, I think, 8 MB.

janboddez commented 1 year ago

Also, (not actually a solution, but) you could always disable all image uploads and just go with Mastodon's default "link preview cards" (assuming you've got the Open Graph or Twitter Card meta, or whatever else Mastodon prefers, in place).

janboddez commented 1 year ago

Gave https://stackoverflow.com/a/7473603 a quick try, and while it is logging the status API call itself, I can't get it to output the multipart/form-data request. Because that would be my next guess. That the instance is somehow trippin' on whatever it is. (Still, feel free to send me all your WordPress debug.logs. They may indicate a WordPress error of some kind.)

Still each and every attempt -- that's on three different instances, from two different WordPresses and two different OSes -- of mine just ... works.

For reference, this is what my debug.log looks like:

[27-Jan-2023 09:54:47 UTC] [Share on Mastodon] Found the following alt text for the attachment with ID 14181: This is 'a bit' of "text with" some LF chars (I hope) and "quotes" in it.
[27-Jan-2023 09:54:47 UTC] [Share on Mastodon] Here's the `alt` bit of what we're about to send the Mastodon API: `--4b7ca0d07c0915981c8e77305ed49639
Content-Disposition: form-data; name="description";

This is 'a bit' of "text with" some LF chars (I hope) and "quotes" in it.
--4b7ca0d07c0915981c8e77305ed49639
`

Note that the line feed characters are in fact replaced by spaces because that seems to be something WordPress does, at least for me. But it shouldn't matter.

That description field preceeded by a "boundary", followed by two linebreaks (CRLF!), then its value, then another boundary, etc., looks (nearly) exactly like the example over at https://stackoverflow.com/questions/4526273/what-does-enctype-multipart-form-data-mean/28380690#28380690 (Note that the boundary really only requires two leading dashes.)

The double quotes are unescaped, but somehow this seems to work; maybe the server escapes them after all when sending the request. Either way, this too shouldn't matter if even the dumbest strings with none of these weird chars don't get sent over.

edent commented 1 year ago

I'm also seeing the same error. My debug log has:

[27-Jan-2023 12:34:02 UTC] [Share on Mastodon] Found the following alt text for the attachment with ID 44518: Book cover.
[27-Jan-2023 12:34:02 UTC] [Share on Mastodon] Here's the `alt` bit of what we're about to send the Mastodon API: `--3077d9fcc460d6d20228759b82f28b4b
Content-Disposition: form-data; name="description";

Book cover.
--3077d9fcc460d6d20228759b82f28b4b

But there's no alt text on https://mastodon.social/@Edent/109761189870150573

It's worth noting that I have the following custom function:

//  https://jan.boddez.net/wordpress/share-on-mastodon
add_filter( 'share_on_mastodon_status', function( $status, $post ) {
    //  Create a short preview of the post
    $status_preamble = "🆕 blog! ";

    $status_title    = "“" . html_entity_decode( get_the_title($post) ) . "”\n";

    $status_stars    = edent_review_get_stars($post->ID) . "\n";

    //  Remove the &hellip; forced by the excerpt and replace with the Unicode symbol
    $status_excerpt  = "\n" . html_entity_decode( get_the_excerpt( $post )) ;

    //  Add a link
    $status_link     = "\n\n👀 Read more: " . get_permalink( $post );

    //  Add tags
    $tags = get_the_tags( $post->ID );
    $status_tags = "";

    if ( $tags ) {
        //  Add a fake <hr>
        $status_tags .= "\n⸻\n";

        foreach ( $tags as $tag ) {
            $status_tags .= '#' . preg_replace( '/\s/', '', $tag->name ) . ' ';
        }
    }

    $status_start = $status_preamble . $status_title . $status_stars;
    $status_end   = $status_link . $status_tags;

    //  Max length 500
    $max_characters = 500;
    $total_characters = 0;

    $total_characters = mb_strlen( $status_start ) + mb_strlen( $status_end );

    //  Old length
    $status_excerpt_length = mb_strlen( $status_excerpt );
    //  Trim the status to the remaining characters. Lose a few for good luck!
    $status_excerpt = mb_substr( $status_excerpt, 0, ($max_characters - $total_characters - 10) );

    //  If this has been reduced, add the …
    if ( $status_excerpt_length > mb_strlen( $status_excerpt ) ) {
        $status_excerpt .= "…";
    }

    $status = $status_start . $status_excerpt . $status_end;

    return $status;
}, 10, 2 );

Do I need something in there to add the alt text?

janboddez commented 1 year ago

@edent No, the alt text is sent in a separate api call together with the binary image data. As your debug log shows, the alt text is found just fine, and is being sent along, or at least, that's what it looks like, as a "description" field. In fact, your debug log looks exactly like mine, with different text and a different (because it's random/time-based) boundary. I see nothing weird here.

Wondering if core WP could somehow mangle the request, or if somehow the server is at fault.

janboddez commented 1 year ago

@edent Was that book cover image—the original—somehow very large?

janboddez commented 1 year ago

Fun fact: when I was on .social, I (eventually) couldn't even upload images at all. I'd just get a 500 error. (I suspect a failed past experiment, oops.) All other (three) instances I tried: no problemo.

So it could be a server thing

edent commented 1 year ago

It was a 36KB image. So not very large. Interestingly, a WordPress toots from a few days ago did have alt - https://mastodon.social/@Edent/109744206923423614 and https://mastodon.social/@Edent/109738544614898276 for example.

janboddez commented 1 year ago

But there's no alt text on https://mastodon.social/@Edent/109761189870150573

@edent Wait ... There absolutely is alt text there, at least, there is now.

janboddez commented 1 year ago

also if possible, it could be good to check if alt text has been set and if not to use the image caption instead as I often set a caption but forget the alt!

@andywarburton Just a heads up: this was delivered in v0.10.0 (i.e., the current version). Although if alt got stripped off for you, captions likely will too (there's zero difference in how the text is sent).

edent commented 1 year ago

But there's no alt text on https://mastodon.social/@Edent/109761189870150573

@edent Wait ... There absolutely is alt text there, at least, there is now.

I'm obviously going peculiar. I would have sworn it wasn't there before!

kelvin0mql commented 1 year ago

Just got a bluehost account this past weekend, fired up a new wordpress on a new domain, and added this plugin.

Immediately noticed this alt text issue, because the Mastodon instances I'm on have rules about images lacking alt text. They will ban people for posting alt-less pics. Accessibility and inclusion very important there.

Hope this can get sorted.

janboddez commented 1 year ago

Yes, I hope so too. It's been working for me (I would almost say unfortunately).

What is super important, though, is that the alt text is added when first uploading the image; it has to be stored in WordPress media library (and not just inside the post content, the plugin will not pick that up).

If it's really not working, you could always disable images (and let Mastodon generate a "link preview card" instead, which will also display, for most sites, the featured image). But then you can't "cross-post" "image galleries."

janboddez commented 1 year ago

I think the source code still has these debug statements in it that will print a bunch of stuff to (on most hosts) wp-content/debug.log, if WordPress's debug logging is enabled and the corresponding setting is enabled in the plugin's settings (on the Debugging tab). Meaning both should be true.

If that's the case, you might be able to help out by grabbing some of the logging. Could be something meaningful in there.

kelvin0mql commented 1 year ago

Oh, that's interesting. I did not pick up on that. I typically upload into the post. So that's my problem. I'll try it media library style and see what happens.

janboddez commented 1 year ago

I typically upload into the post. So that's my problem.

That should be okay, as long as alt text is filled out.

A good check would be to double-check it's really there also when visiting the media library afterward (preferably before publishing the post).

janboddez commented 1 year ago

Also, I once suspected special characters, but alas. Any way I tried to escape them, I'd just get garbage in the end result, where leaving them unescaped, again, always "just worked." (Anyhow, just trying to come up with ideas ...)

kelvin0mql commented 1 year ago

I'll try various things, verify what is or isn't there, etc. Will see if I can

  1. find a workaround, and
  2. provide you with some helpful info/insight.
kelvin0mql commented 1 year ago

I am so confused.

If I add an image to the Media Library, give it alt, caption, description, and then add a post, the resultant mastodon toot is just the title and the link. No image.

https://mastodon.social/@kelvin0mql/110429790121373449

But if I start by adding a post, write a title, change the 1st block to an image block, upload a pic into there, make a 2nd block of paragraph type, add some text, then go to the image block, click 3 dots "edit as HTML" and insert some characters into the alt attribute, and publish that, that makes a nice toot. With an image. But without alt text.

https://mastodon.social/@kelvin0mql/110429800792544550

Let me be clear.

A post with a pic needs to be a toot with a pic.

If that pic had alt txt on WordPress, it needs to not get discarded on its way to Mastodon.

Is that, or is that not, the intent of this plugin? I may have misunderstood the aim of this thing.

janboddez commented 1 year ago

If I add an image to the Media Library, give it alt, caption, description, and then add a post, the resultant mastodon toot is just the title and the link. No image.

That's likely because the image isn't attached to the post (a WP term). Have you tried enabling "in-content" images on the settings page/images tab?

But if I start by adding a post, write a title, change the 1st block to an image block, upload a pic into there, make a 2nd block of paragraph type, add some text, then go to the image block, click 3 dots "edit as HTML" and insert some characters into the alt attribute, and publish that, that makes a nice toot. With an image. But without alt text.

If you do "edit as HTML," the alt text is not tied to the image's database record.

I think I may know what's going on. In the classic editor, when you insert an image, the upload modal does have an alt text field, and that field's contents are saved to the database.

In Gutenberg, you upload an image, it does not ask for alt text. You can add it through the sidebar panel, sure, but: that does not get saved to the image meta. That's a flaw I had not yet encountered.

kelvin0mql commented 1 year ago

I am COMPOSING the post the exact same way both times.

The only difference is whether I uploaded the image into the first block, or selected from the library.

If I select it from the library, the image IS NOT INCLUDED IN THE TOOT.

Please explain that. Because if the media library image - which has alt txt in the database - is not even included in the toot, then what is the point? We either do it wrong, or we don't do it at all. That's bad, and worse.

I do not understand what you mean by "isn't attached to the post".

Do we need to do a screen share so I can demonstrate to you how very attached the image is?

janboddez commented 1 year ago

So, (temporary) conclusion: Gutenberg does not save alt text to the database in the same way the classic editor does. So there's no way for the plugin to retrieve it.

Except for scraping the post content, which I will look into. Shouldn't be too hard for "in-content" images, but it would be (a bit harder) for "attached" or "featured" images (which may or may not be part of the post content).

I also thought that in the classic editor, inserting (or maybe this is true only if the image is set as a featured image) an image would also attach it to a post, if it wasn't yet attached to a(ny other) post. Seems this, too, is not true with Gutenberg then.

kelvin0mql commented 1 year ago

I'll look again. I don't remember seeing either "classic" or "gutenberg" on my screen.

Latest WP installed. I'm using what's there. That's it. As it comes out of the virtual box.

janboddez commented 1 year ago

I do not understand what you mean by "isn't attached to the post".

There's a link in the docs that explains what attached images are, according to WordPress. https://jan.boddez.net/wordpress/share-on-mastodon#images

If a media file is uploaded within the edit screen, it will automatically be attached to the current post being edited. If it is uploaded via the Media Add New Screen or Media Library Screen it will be unattached, but may become attached to a post when it is inserted into post.

https://wordpress.org/documentation/article/use-image-and-file-attachments/#attachment-to-a-post

It seems that last bit is no longer true for WordPress' new editor.

janboddez commented 1 year ago

If you want the plugin to look for images in the post content, that can be done:

image

kelvin0mql commented 1 year ago

I read that doc you pointed me at. I know what a featured image is. I've done that.

I still do not know the difference between attached or in-post. They're all IN the POST. Are they not?

Click a block's plus-sign. This thing pops up... image ...and pick image.

There. It's an image. In the post. And also seems attached. Like, it says where I put it... very attached... very in the post.

kelvin0mql commented 1 year ago

image I just now checked the bottom "experimental" one. Previously, had only the others checked.

kelvin0mql commented 1 year ago

I changed 2 things between tests. That's the best way, so that you still have a fun mystery as to which fixed it.

So, experimental In-Post checked in settings.

Then, composed a post using the Media Library. THEN Featured Image. THEN Publish.

That worked - at long last.

https://mastodon.social/@kelvin0mql/110429951207295247

(Now I gotta train my wife on how to do all this extra workaround stuff. Ugh.)

janboddez commented 1 year ago

Attached means that there's a special database relation between the image and the post. An image uploaded during post creation is (normally) attached, but an image in the post's content (but, e.g., uploaded previously) isn't necessarily attached to that post.

It's a WordPress thing, nothing to do with this plugin per se.

Now that I finally know what's causing this (the original issue's) behavior, I can finally start looking at parsing image blocks' alt text.

Gutenberg, by the way, is another name for WordPress' block editor. WordPress previously used, and optionally still does, a TinyMCE-based editor. That's the classic editor. They behave rather differently, but it's far from obvious, greatly complicating things for humble plugin developers who've got nothing to gain from any of this.

janboddez commented 1 year ago

Turns out the Image block gets its alt text from the rendered image element. So I'm going to have to recreate that logic in PHP. And somehow store it. That's a fairly big refactor of all things images. And this would still only apply to images inside the post content (and not to featured or attached-but-not-inserted posts). But for most folks it'd probably work (if they also select the in-post image option, which is experimental because it wouldn't work for external images, and might not work for sites that use a CDN to deliver images).

kelvin0mql commented 1 year ago

Discussed with my wife (for whose regular use I'm doing all this research).

She's somewhat less likely to be on a Mastodon instance that is "strict" about alt txt. Less likely to get her wrist slapped for posting alt-less images (unless she chooses artisan.chat instance).

The workaround of:

  1. put the image in the Media Library
  2. add alt txt (etc.) in the Library
  3. pick a Featured Image in the post ...appears to work fine.

Even when it doesn't, she'll be using a desktop browser interface for this, so editing an existing Mastodon Toot to add alt text after the fact is also totally viable.

Therefore, for US, this is not a train-stopper issue.

Alt txt is very important for some communities. For mine, not so much. On more than one occasion, my alt txt is something akin to "If I could describe it satisfactorily with words, I wouldn't have used a picture." or "You really have to see it to believe it." And no gang of blind people have come to beat me to death with their white canes... yet.

janboddez commented 1 year ago

v0.15.0 is now out, and it attempts to address this is issue as follows:

TL;DR: For most if not all cases (there's always weird cases where we can't convert an in-post image URL back to its database ID, because of custom filters or a CDN's in use or whatever, and I'd like to know about them), this issue should be fixed. Regardless of whether you're using Gutenberg or the classic editor.

kelvin0mql commented 1 year ago

Excited to experiment further.

carol55512 commented 1 year ago

So alt text actually broke for me sometime between May 10 and Jun 8 and I wonder if this change might be related?

I use the classic editor, currently on 6.2.2.

My process is pretty consistent:

  1. Add New Post, type words
  2. Click the Add Media button (in the classic editor).
  3. Upload Files to upload the image
  4. Add alt text to the image
  5. Insert into post
  6. Publish post

I turned on debugging, and for my latest test, see this message: [19-Jun-2023 19:09:23 UTC] [Share on Mastodon] Did not find alt text for the attachment with ID 818

But there is alt text? It was included in the image posted to my blog, and when I go back and check the Media Library, I see alt text listed in Media Library for the image under "Alternative Text."

Let me know if there's any additional info I can provide. Thanks!

janboddez commented 1 year ago

Cool, thanks for letting me know. Latest WP.org version was released May 30 so that would match.

I've done some work to revert some of the changes while, hopefully, keeping the benefits, I'll look at it again and try to release soon.

janboddez commented 1 year ago

Test post using the classic editor and an in-post image that isn't the post's featured image (or even "attached" to the post, as it was uploaded elsewhere): https://indieweb.social/@janboddez/110575400216326426

Debug log:

[20-Jun-2023 07:38:25 UTC] [Share on Mastodon] The images selected for crossposting (but not yet limited to 4):
[20-Jun-2023 07:38:25 UTC] Array
(
    [14883] => A crappy smartphone image of the Taipei 101, taken April 2023.
)

[20-Jun-2023 07:38:25 UTC] [Share on Mastodon] The images as found in the post:
[20-Jun-2023 07:38:25 UTC] Array
(
    [14883] => A crappy smartphone image of the Taipei 101, taken April 2023.
)

[20-Jun-2023 07:38:25 UTC] [Share on Mastodon] Found the following alt text for the attachment with ID 14883: A crappy smartphone image of the Taipei 101, taken April 2023.
janboddez commented 1 year ago

And with Gutenberg, and no alt text in the database (the previous one did have the same alt text in both the database and the post content): https://mastodon.vlaanderen/@ochtendgrijs/110575421038959693

(This one's in Dutch but it says the same thing, sort of. And it doesn't really matter; it seems to work is what matters.)

[20-Jun-2023 07:43:44 UTC] [Share on Mastodon] The images selected for crossposting (but not yet limited to 4):
[20-Jun-2023 07:43:44 UTC] Array
(
    [8096] => Smartphonefotootje van de 'Tapei 101' ergens in april '23.
)

[20-Jun-2023 07:43:44 UTC] [Share on Mastodon] The images as found in the post:
[20-Jun-2023 07:43:44 UTC] Array
(
    [8096] => Smartphonefotootje van de 'Tapei 101' ergens in april '23.
)

[20-Jun-2023 07:43:44 UTC] [Share on Mastodon] Found the following alt text for the attachment with ID 8096: Smartphonefotootje van de 'Tapei 101' ergens in april '23.
janboddez commented 1 year ago

That's both with the latest "main." I'll release to WP.org later today.

carol55512 commented 1 year ago

Can verify that this issue seems fixed for me now (currently on 0.16.2). Thanks!

janboddez commented 11 months ago

I think the initial issue (some users unable to transfer alt text) was due to Gutenberg not storing alt text to the Media Library/database when an image is uploaded. Fixed that by scanning the post content in addition.

Then all that got mangled up (but, again, not for everyone), somewhere around v0.16, and we seem to also have fixed that.

Going to try and close this :-)

kelvin0mql commented 11 months ago

It's been working grand for me of late... for what it's worth. Many thanks.