Closed benborges closed 7 months ago
In the new queue
branch, I have it set to take metadata information from the RSS feed if OpenGraph data is not available. The RSS spec does have a field for images, but I don't know of a lot of feeds that use it.
I didn't fully understand, but I think this is my answer to what you were asking: It would be sort of complicated to scan the description field for media links and then feed those to Bluesky. I don't think that there are a lot of use cases where that would be needed, so it wouldn't be worth adding it. Now, you could use/make something that looks at the RSS feed, and then creates a new RSS feed with the <image>
tag and then fee that to the bot, and I can add support for the image tag.
Hopefully, I understood that right. Correct me if I got anything wrong.
Yes you got me right, and your proposal should work, I could reformat the rss feed to get the media URL to be an
I tried to replicate this with this bot
the RSS feed has an enclosure tag with even the size of the images, I'm wondering if this could be integrated
view-source:rappel.conso.gouv.fr/rss <enclosure url>
not sure it's very standard, but I'm seeing this in a bunch of different feeds for images tho
I also tried to recondition this RSS feed by having the enclosure image url content wrapped into an <image>
tag but I had issues with my reconstructed feed with the datefield not being found (all tho I was using pubDate)
Can you send me the example for one of your reconstructed items in the RSS feed?
Can you send me the example for one of your reconstructed items in the RSS feed?
Here is the link
I have read some more documentation and apparently, the best approach to add an image file to an item is to in fact use the image tag, not enclosure, plus with enclosure, to respect the specs, you're supposed to deliver also the lenght/size of things which is not necessary in the case of images, in short the image tag is simpler to use for item that you want to associate an image to and the enclosure is obviously the proper way to handle podcast/video/audio
check this out, I was talking to a contact about this issue for this particular RSS feed and he went on and added a new Config option for images, is this how you would play out this request?
https://github.com/garaytc/bsky.rss/commit/2a4e0cb4168a4dee9cf5077b8f274a82ffb45a99
edit confirmed to be working
had to fill the Config variable with media:content like my image element
Feature added to queue
branch with the use of imageField: ""
So this is working (and it works with anything (media:content
on my reconstructed RSS feed, or enclosure
from a vanilla RSS feed) BUT
But some reasons, I'm not seeing the pooling/queuing system at work, it fetched new items and posted them right away and so overflown the limit with this error :
bsky-rss-sciencesFR | (node:30) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 22 terminated listeners added to [Fetch]. Use emitter.setMaxListeners() to increase limit
bsky-rss-sciencesFR | (Use `node --trace-warnings ...` to show where the warning was created)
bsky-rss-sciencesFR | /build/node_modules/@atproto/xrpc/src/client.ts:126
bsky-rss-sciencesFR | throw new XRPCError(resCode, res.body.error, res.body.message)
bsky-rss-sciencesFR | ^
bsky-rss-sciencesFR | XRPCError: too many concurrent writes
bsky-rss-sciencesFR | at ServiceClient.call (/build/node_modules/@atproto/xrpc/src/client.ts:126:15)
bsky-rss-sciencesFR | at processTicksAndRejections (node:internal/process/task_queues:95:5)
bsky-rss-sciencesFR | at async PostRecord.create (/build/node_modules/@atproto/api/src/client/index.ts:1519:17) {
bsky-rss-sciencesFR | status: 400,
bsky-rss-sciencesFR | error: 'ConcurrentWrites',
bsky-rss-sciencesFR | success: false
bsky-rss-sciencesFR | }
bsky-rss-sciencesFR | error Command failed with exit code 1.
bsky-rss-sciencesFR | info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Just found a case, with this RSS feed, that when using the imageField config, even tho the image is present in the feed, it does not get posted but also does not generate any error.
I used enclosure
as image field, the same way I used it with this feed which is working hust fine, both use the enclosure
tag for images
Oh and I double checked, when I use the current queue branch state to generate my local docker image, the queue system, is not working as it was, just before the merging of the ImageField feature.
I cannot replicate the rate-limiting issue, using https://feeds.simplecast.com/54nAGcIl
(and posting all of that to Bluesky). I'll look into the imageField issue.
EDIT: Pushed a possible fix for the rate-limiting issue.
EDIT 2: Images are posting fine from the first link you posted in your comment above. My config is "imageField": "enclosure"
.
Found my issue, I accidentally did a docker build -t my-docker image . inside a git repo on the main branch, and called it bsky-queue and then used this image on different bots, basically creating my own problem while testing it!!!
So, the queue branch is fine, along with the ImageField, it's all working perfectly and sites/RSS that previously did not have images are now able to post with images just fine.
just clarify on this, this this error only happened because the bot was running on the main branch, my bad.
Ah, I see.
One thing, after testing with some other uses case, I think it would be best if the fetching of the image was outside the opengraph loop
can be reproduced with this feed : https://reporterre.net/spip.php?page=backend-simple
Yes, we definitely need to change how the first image option is taken, it should be OpenGraph first and if not, then enclosure
edit: did some more digging and the feeds where this happen are feeds that do not have native image or enclosure tag as an element in the feed but rather have the image as a img src inside the description of each item in the feed. that's the case for a LOT of feeds out there, i'll repackage some of my feeds to control this part and keep my bots operational but i'm wondering if this use case could be integrated ?
Moved the image fetching for RSS-provided images outside of the Open Graph fetching on the latest commit to the queue
branch.
As for parsing for img src in descriptions of feeds, like I've stated before that's not in-spec at all, so I don't see it making sense to add it. Personally, I've not come across many, if at all, any feeds that do that. If you would like to get images from the description, I'd suggest fetching the feed and rewriting the data in a standard format, or creating a PR with your suggested implementation of this.
Alright, will probably be going to the route of rewriting these RSS feeds for the feed concerned (French & Belgian media CMS's with odd implementation of RSS specs)
Thanks very much !
Moved the image fetching for RSS-provided images outside of the Open Graph fetching on the latest commit to the
queue
branch.As for parsing for img src in descriptions of feeds, like I've stated before that's not in-spec at all, so I don't see it making sense to add it. Personally, I've not come across many, if at all, any feeds that do that. If you would like to get images from the description, I'd suggest fetching the feed and rewriting the data in a standard format, or creating a PR with your suggested implementation of this.
I have tried this on a few bots, but with or without "imageField": "enclosure"
I can't get it to pick the OpenGraph like it used to, so I end up with no images at all, even if the queue is working properly.
Odd, mind giving me the RSS feed you're using?
[Sat, 26 Aug 2023 19:14:02 GMT] - [bsky.rss QUEUE] Queuing item (In case anyone wanted a good laugh this morning)
/build/app/utils/rssHandler.ts:63
let imageUrl: string = openGraphData.ogImage[0].url;
^
TypeError: Cannot read properties of undefined (reading '0')
at FeedSub.<anonymous> (/build/app/utils/rssHandler.ts:63:46)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Node.js v18.17.0
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
yarn run v1.22.19
$ tsx ./app/index.ts
[Sat, 26 Aug 2023 19:14:22 GMT] - [bsky.rss APP] Started RSS reader. Fetching from https://www.inoreader.com/stream/user/1004571328/tag/Trump
RSS feed
Should be fixed now
Should be fixed now
No more errors, but no images embed, on the last git pull & rebuilt local docker image
Can you send me your config.json? Posts were working fine for me.
config.json for this bot
{
"string": "$title",
"publishEmbed": true,
"languages": ["en"],
"truncate": true,
"runInterval": 60,
"imageField": "enclosure",
"dateField": ""
}
The RSS feed you provided doesn't have an enclosure
feed for items, and when you provide a value for imageField
in the config, the application looks for that and ignores Open Graph images.
The RSS feed you provided doesn't have an
enclosure
feed for items, and when you provide a value forimageField
in the config, the application looks for that and ignores Open Graph images.
Ohh it ignores it? I thought it was checking one or the other, but OpenGraph first ok, so if my feed is not concerned by missing opengraph tags, I should not be using the new Imagefield, correct ?
imageField should only be used if you want to strictly take images from the RSS feed posts and never from Open Graph. If the feed doesn't consistently post images, then using Open Graph to fetch images will probably be a better option than using imageField
Understood !
I'm testing now without imageField, using :
`image: ghcr.io/milanmdev/bsky.rss:queue-2d5e0b6`
RSS feed
{
"string": "$title",
"publishEmbed": true,
"languages": ["en"],
"truncate": true,
"runInterval": 60,
"dateField": ""
}
Same Image issue basically than with the Trump bot here both bots were getting image just fine still today
Can you send me your config.json? Posts were working fine for me.
So I moved all my bots to your image, removed any imageField from config.json, unless the feed is unique and has an enclosure for image on its own, then I docker-compose up and it's all getting posted, but no image where there was images previously, before the imagefield was merged I guess ?
for each of these, the config.json is equal to https://github.com/milanmdev/bsky.rss/issues/27#issuecomment-1694494723
Pushed a fix. Available in ghcr.io/milanmdev/bsky.rss:queue-002738b
Pushed a fix. Available in
ghcr.io/milanmdev/bsky.rss:queue-002738b
Thanks a lot for the fixes!!, redeployed with this image and it's now properly running, with images !
Confirm that everything is running neat also on my side, beside two multi-feeds, that have sources with their own lack of OpenGraph support
producing this error :
[Sun, 27 Aug 2023 11:57:18 GMT] - [bsky.rss QUEUE] Starting queue handler. Running every 60 seconds
/build/app/utils/rssHandler.ts:64
let imageUrl: string = openGraphData.ogImage[0].url;
^
TypeError: Cannot read properties of undefined (reading '0')
at FeedSub.<anonymous> (/build/app/utils/rssHandler.ts:64:48)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Node.js v18.17.1
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 1.
Which second @garaytc can be fixed with this PR https://github.com/milanmdev/bsky.rss/pull/37
Tested @garaytc queue branch directly and it does fix this issue. (he's yet to push to here though)
I have an odd RSS feed, it's constructed out of Twitter/X users, only their media posts, using RSShub (no API usage)
The original RSS feeds looks like this : https://rsshub.app/twitter/media/Defmon3/
I aggregate many osint users into one inoreader folder and the RSS looks like this : RSS https://www.inoreader.com/stream/user/1005072895/tag/OSINTbridge/ JSON https://www.inoreader.com/stream/user/1005072895/tag/OSINTbridge/view/json HTML view : https://www.inoreader.com/stream/user/1005072895/tag/OSINTbridge/view/html?t=OSINTbridge&cs=m&sb=y
In a Inoreader usecase such as this one, the media file is on the description field of the RSS feed
I tried to fetch this link and store it in a item inside the RSS channel
but I didn't manage to be able to post image/videos from this field
This feed is taken from the JSON link above and I manipulate it with N8N to construct the RSS feed the way I want
I removed the media elements from my feed, so that the bluesky bot can post using the description field safely.
https://webhook.ukrainewararchive.eu/webhook/osint.rss
So I was wondering, would it be complicated, doable to default on the media link (video or image) on the Description field if there is nothing to fetch on the OpenGraph meta of the source link ?
Secondary question : would Bluesky API allow to upload media files from the enclosure item of an RSS feed ? (going to explore this question reading their doc, but perhaps you have some idea about this already)