Open khalilcodes opened 1 year ago
I've also noticed missing images elsewhere e.g. on https://lifeitself.org/hubs/berlin
See for comparison the old page: https://web.archive.org/web/20220520195709/https://lifeitself.us/hubs/berlin/ which has a lot more images.
Aside: it also has a nice hero image - i wonder if we can work out how to do pages like that ... (that's another issue)
@khalilcodes
On further investigation it was found that ALL image paths including external ones like
<img src="https://i.imgur.com/OPrvrNS.png" />
were converted to relative paths and downloaded to assets/images folder. This has led to having a larger assets folder with images that were not needed locally in the first place.
Actually, this was probably ok in most cases since these are our pictures and having them locally is good (though it would be nice to know - if easy to do - to know what external files got pulled locally).
@khalilcodes is there a simple way for us to extract a list of all images linked from markdown (e.g. grep *.png
and grep *.jpg
) and then just check which 404 and then fix those. That would be enough for now.
Re fixing using the new updated script let's not do that because as you say issue with over-writing changes:
This would mean we'd have to replace all posts/pages (before 2023) with the fixed ones, but what about modified content in these pages if any. Not sure if worth doing.
@khalilcodes @rufuspollock I have moved this into the next iteration. I am not sure on the status of this so maybe one of you can update accordingly 😄
@khalilcodes can you look at this briefly and see what status is - would be great to close this out.
@rufuspollock running grep I can see there are quite a few images that are missing and weren't downloaded. https://web.archive.org/web/20220520195709/https://lifeitself.us doesn't seem to open at my end. Is there an archive of all images there ?
@rufuspollock running grep I can see there are quite a few images that are missing and weren't downloaded. web.archive.org/web/20220520195709/https://lifeitself.us doesn't seem to open at my end. Is there an archive of all images there ?
the archive.org link was just illustrative - though i could imagine you could use it to identify the original link for missing images ...
@rufuspollock missing Images added in PR #521. There are some broken links to images in the following pages and those could not be found.
@khalilcodes 👏
Re the 2 missing: were you able to them in archive.org/web versions? And can you link the archive.org/web versions of those pages that you found.
@rufuspollock, @nathenf, @laurenwigmore I've created a list of all the image embeds lacking corresponding image files, just to be sure (see the table in the description above). I've fixed some of them but the rest is really missing. Any hints where I could find them?
@olayway i note Khalil mentioend the november newsletter images were just missing entirely and not even on the really old wordpress site. For thos, only hope here would be looking in wayback machine.
Personally, i think this issue is low priority for now.
After using the https://github.com/flowershow/wordpress-to-markdown script to convert pages/posts to markdown files (with assets) for lifeitself migration, some pages have links to images and these are missing in
assets/images
folder.On further investigation it was found that ALL image paths including external ones like
<img src="https://i.imgur.com/OPrvrNS.png" />
were converted to relative paths![](/assets/images/OPrvrNS.png)
and downloaded toassets/images
folder. This has led to having a larger assets folder with images that were not needed locally in the first place.Main issue
Although external images are parsed to relative paths and even downloaded, some images for example from https://artearthtech.wordpress.com failed to download and therefore do not render on page.
This issue was first found for the page https://lifeitself.org/blog/2019/01/22/ken-wilber-integral-spirituality where in wordpress had images referencing to files on artearthtech.files.wordpress.com eg. https://artearthtech.files.wordpress.com/2020/03/ken-wilber-map.jpg?w=776 and was fixed for that page in commit abc765d.
Acceptance
assets/images
Tasks
assets/images
folderNotes
Status update 13 Sep 2023
Here is a list of all the image embeds that do not have a corresponding image file. Some of them just used the wrong extensions.