life-itself / community

👋 Life Itself's community home with forum, projects and website.
https://lifeitself.org/
16 stars 11 forks source link

Missing images in assets folder #399

Open khalilcodes opened 1 year ago

khalilcodes commented 1 year ago

After using the https://github.com/flowershow/wordpress-to-markdown script to convert pages/posts to markdown files (with assets) for lifeitself migration, some pages have links to images and these are missing in assets/images folder.

On further investigation it was found that ALL image paths including external ones like <img src="https://i.imgur.com/OPrvrNS.png" /> were converted to relative paths ![](/assets/images/OPrvrNS.png) and downloaded to assets/images folder. This has led to having a larger assets folder with images that were not needed locally in the first place.

Main issue

Although external images are parsed to relative paths and even downloaded, some images for example from https://artearthtech.wordpress.com failed to download and therefore do not render on page.

This issue was first found for the page https://lifeitself.org/blog/2019/01/22/ken-wilber-integral-spirituality where in wordpress had images referencing to files on artearthtech.files.wordpress.com eg. https://artearthtech.files.wordpress.com/2020/03/ken-wilber-map.jpg?w=776 and was fixed for that page in commit abc765d.

Acceptance

Tasks

Notes

Status update 13 Sep 2023

Here is a list of all the image embeds that do not have a corresponding image file. Some of them just used the wrong extensions.

Image Path File Status
assets/images/how-much-is-enough_2256291b.jpg blog/2017/02/05/summary-how-much-is-enough-skidelsky-2012 Missing
assets/images/man_walking.png blog/2017/06/28/the-middle-way-what-were-about ✅ Wrong extension used. Fixed.
assets/images/nafeeshamid.png blog/2019/04/17/blind-spots-2-returning-to-mystery ✅ Wrong extension used. Fixed.
assets/images/esteban-post02.jpeg blog/2019/06/16/is-the-answer-to-our-tech-problems-another-app ✅ Wrong extension used. Fixed.
assets/images/nafeeshamid.png blog/2019/06/17/explaining-the-cognitive-triggers-for-extremist-violence-through-brain-scanning ✅ Wrong extension used. Fixed.
assets/images/November-news-blindspots.jpg blog/2019/11/30/2019-november-newsletter Missing
assets/images/NOvember-news-future-hub.jpg blog/2019/11/30/2019-november-newsletter Missing
assets/images/November-news-dino.jpg blog/2019/11/30/2019-november-newsletter Missing
assets/images/November-news-Hannah.jpg blog/2019/11/30/2019-november-newsletter Missing
assets/images/NOvember-news-Liam.jpg blog/2019/11/30/2019-november-newsletter Missing
assets/images/November-news-wordle-two.png blog/2019/11/30/2019-november-newsletter Missing
assets/images/CAintegraltheoryblog.png blog/2019/12/13/contemplative-activism-primer-the-pre-gathering-read Missing
assets/images/2020-2-02-blog-aet-team-faces-window.JPG blog/2020/01/24/ways-of-being-for-2020-aet-winter-sprint-january-2020 Missing
assets/images/2020-02-02-blog-aet-ways-of-being.JPG blog/2020/01/24/ways-of-being-for-2020-aet-winter-sprint-january-2020 Missing
assets/images/2020-02-02-aet-focus-2020.JPG blog/2020/01/24/ways-of-being-for-2020-aet-winter-sprint-january-2020 Missing
assets/images/whatsapp-image-2020-08-19-at-14.46.51-1.jpg blog/2020/09/10/more-than-just-bricks-and-mortar-bergerac-build-festival-2020 Missing. Similar image name: whatsapp-image-2020-08-19-at-14.46.51-1-1
rufuspollock commented 1 year ago

I've also noticed missing images elsewhere e.g. on https://lifeitself.org/hubs/berlin

See for comparison the old page: https://web.archive.org/web/20220520195709/https://lifeitself.us/hubs/berlin/ which has a lot more images.

Aside: it also has a nice hero image - i wonder if we can work out how to do pages like that ... (that's another issue)

rufuspollock commented 1 year ago

@khalilcodes

On further investigation it was found that ALL image paths including external ones like <img src="https://i.imgur.com/OPrvrNS.png" /> were converted to relative paths and downloaded to assets/images folder. This has led to having a larger assets folder with images that were not needed locally in the first place.

Actually, this was probably ok in most cases since these are our pictures and having them locally is good (though it would be nice to know - if easy to do - to know what external files got pulled locally).

rufuspollock commented 1 year ago

@khalilcodes is there a simple way for us to extract a list of all images linked from markdown (e.g. grep *.png and grep *.jpg) and then just check which 404 and then fix those. That would be enough for now.

Re fixing using the new updated script let's not do that because as you say issue with over-writing changes:

This would mean we'd have to replace all posts/pages (before 2023) with the fixed ones, but what about modified content in these pages if any. Not sure if worth doing.

nathenf commented 1 year ago

@khalilcodes @rufuspollock I have moved this into the next iteration. I am not sure on the status of this so maybe one of you can update accordingly 😄

rufuspollock commented 1 year ago

@khalilcodes can you look at this briefly and see what status is - would be great to close this out.

khalilcodes commented 1 year ago

@rufuspollock running grep I can see there are quite a few images that are missing and weren't downloaded. https://web.archive.org/web/20220520195709/https://lifeitself.us doesn't seem to open at my end. Is there an archive of all images there ?

rufuspollock commented 1 year ago

@rufuspollock running grep I can see there are quite a few images that are missing and weren't downloaded. web.archive.org/web/20220520195709/https://lifeitself.us doesn't seem to open at my end. Is there an archive of all images there ?

the archive.org link was just illustrative - though i could imagine you could use it to identify the original link for missing images ...

khalilcodes commented 1 year ago

@rufuspollock missing Images added in PR #521. There are some broken links to images in the following pages and those could not be found.

rufuspollock commented 1 year ago

@khalilcodes 👏

Re the 2 missing: were you able to them in archive.org/web versions? And can you link the archive.org/web versions of those pages that you found.

olayway commented 8 months ago

@rufuspollock, @nathenf, @laurenwigmore I've created a list of all the image embeds lacking corresponding image files, just to be sure (see the table in the description above). I've fixed some of them but the rest is really missing. Any hints where I could find them?

rufuspollock commented 7 months ago

@olayway i note Khalil mentioend the november newsletter images were just missing entirely and not even on the really old wordpress site. For thos, only hope here would be looking in wayback machine.

Personally, i think this issue is low priority for now.