scripting / a8c-FeedLand-Support

A public repo for discussing FeedLand at A8C.
1 stars 0 forks source link

rssCloud debugging #38

Open cagrimmett opened 11 months ago

cagrimmett commented 11 months ago

rssCloud might still not be working. @fmfernandes, please set up some A/B tests to debug.

cagrimmett commented 11 months ago

Related: https://daveblogproduction.wpcomstaging.com/2023/09/11/rsscloud-issues/

fmfernandes commented 11 months ago

rssCloud is indeed not working. Here's what I did:

  1. Using my test site on WordPress.com: https://fernandomfernandes.wordpress.com/
  2. Subscribed to it on a8c.feedland.org
  3. Went to the new post screen.
  4. Checked the river for my feed
  5. Hit publish on the WordPress.com site.
  6. Saw the log messages below.
  7. Nothing showed up on the river page.
  8. Refreshed the river page and the new post appeared.

This is the application logs right after I hit publish on my blog:

Timestamp console.log
09:08:36 PM 9:08:36 PM POST a8c.feedland.org:80 /feedupdated https://a8c.feedland.org/feedupdated ::ffff:127.0.0.1
09:08:36 PM handleRssCloudPing: feedUrl == https://fernandomfernandes.wordpress.com/feed/
09:08:36 PM 9:08:36 PM checkOneFeed: feedUrl == https://fernandomfernandes.wordpress.com/feed/
09:08:36 PM 09:08PM: aaaa post
09:08:37 PM 9:08:37 PM checkOneFeed: feedUrl == http://pattydebenham.com/feed
09:08:38 PM 9:08:38 PM checkOneFeed: feedUrl == https://www.davidlebovitz.com/feed/
09:08:38 PM notifySocketSubscribersFromSql: idLastNewItem == 440525
09:08:40 PM notifySocketSubscribersFromSql: idLastNewItem == 440525
fmfernandes commented 11 months ago

I'm not sure what I need to look for in FeedLand to check why it's not showing up in the River. I can see a call for handleRssCloudPing which looks promising and even the name of the new post (aaaa post) but it didn't show up in the river view right after publishing.

scripting commented 11 months ago

Did you try subscribing on FeedLand.org to see if it’s getting updates instantly?

scripting commented 11 months ago

Did you account for the river cache? The better test is whether the feed shows up at the top of the Feed List page when you refresh it. We may be able to turn off the river cache altogether because of the increased performance. But don't let that interfere with your test.

By A/B test -- I meant comparing the result on feedland.org and A8C.feedland.org. Does the change show up immediately on one but not the other? When all is working they should appear at roughly the same time.

Are you sure your WordPress is sending rssCloud pings?

fmfernandes commented 11 months ago

Did you try subscribing on FeedLand.org to see if it’s getting updates instantly?

It didn't show up on FeedLand as well.

Are you sure your WordPress is sending rssCloud pings?

Isn't this what the handleRssCloudPing message indicates? That we got the ping from WordPress.

I guess my next test will be trying that when the river cache is disabled.

scripting commented 11 months ago

Why deal with the river cache -- you don't need to do that. Use the Feed List page. Or even better, open the JS console in the client, when a new item comes in you'll see it scroll through the console there.

image
fmfernandes commented 11 months ago

Or even better, open the JS console in the client, when a new item comes in you'll see it scroll through the console there.

And the new item shows up in the console. The table in this comment is the console output on a8c.feedland.org right after I published a new post.

So, we're getting the ping:

handleRssCloudPing: feedUrl == https://fernandomfernandes.wordpress.com/feed/

We're checking the feed again:

9:08:36 PM checkOneFeed: feedUrl == https://fernandomfernandes.wordpress.com/feed/

And the recently published post shows up in the console:

09:08PM: aaaa post

And after a page reload, the new item shows up in the River and in the feed list.

fmfernandes commented 11 months ago

I think I was expecting the new item to show up instantly without me doing anything. But I believe we disabled that functionality. Given that, I think rssCloud is working as expected 🙂

scripting commented 11 months ago

good! ;-)

i wish i had one of those funky icons from slack to put in here.

scripting commented 10 months ago

Steps to reproduce problem with rssCloud and feedland.com.

  1. Open a new window, and in one tab open the feed list page on feedland.org, and on the other the feed list page from feedland.com.
  2. Make sure you're subscribed to your personal feed on feedland.org and feedland.com.
  3. If these are new subscriptions, give them a few moments to renew your subscription with the cloud server. You can see that's happened by looking at the Feed Info page in each of the servers. Screen shot.
  4. Post a new message to your personal feed on feedland.org, which will cause the feed to update, and to ping the rssCloud server it's hooked up to, which is what theoretically both feedland.org and feedland.com have subscribed to (that's what "renew" does).
  5. Now open each of the two tabs created in step one and reload the page. You will see that your personal feed shows up at the top of the feedlist on feedland.org but not at the top of the list on feedland.com.

I did this with my personal feed, which is also my linkblog feed. If you can't get this to reproduce I can demo it for you.

fmfernandes commented 10 months ago

I wonder if I'm looking for the right thing here, but this log page shows a log like this:

Notify message time duration
Notify Subscriber feedland.org:80 was notified that resource has changed via http: protocol. 12:51PM 0.194

But for feedland.com it doesn't show it even as a subscriber... So I'm thinking that may be a problem when communicating with rpc.rsscloud.io for some reason

scripting commented 10 months ago

But for feedland.com it doesn't show it even as a subscriber

What does this mean??

scripting commented 10 months ago

What is your personal feed? What did you subscribe to in step 2?

fmfernandes commented 10 months ago

What does this mean??

Not sure 😅 Just that something is off since that log page doesn't show pings/notifies to feedland.com. It instantly notifies feedland.org about updates to my feed which is not the case for feedland.com. I'm thinking we're not even subscribing with the cloud server.

What is your personal feed? What did you subscribe to in step 2?

data.feedland.org/feeds/fmfernandes.xml

scripting commented 10 months ago

Thanks for explaining.

I added a way for you to manually cause a feed to renew. See screen shot below.

You have to update to get it, new versions of feedland and feedlanddatabase, 0.6.23 and 0.7.14.

image

cagrimmett commented 10 months ago

I just replicated what @fmfernandes said. When I publish a new post to http://data.feedland.org/feeds/cagrimmett.xml and check https://rpc.rsscloud.io/viewLog, I see the ping to rsscloud.io, and then a notify event going out to feedland.org, but nothing to feedland.com.

scripting commented 10 months ago
<?xml version="1.0"?>
<notifyResult success="false" msg="The subscription was cancelled because the call failed when we tested the handler."/>
scripting commented 10 months ago

It isn't working because when the cloud server tried to call us we failed to respond correctly.

scripting commented 10 months ago

I need to check for this error and do a console.log with it, so you will be alerted to the problem.

In the meantime, we should try to find out what if any firewall is there blocking the port or the message, or whatever. Because the same code in FeedLand is running fine here with that server, so it seems the connection is the most likely place to look,.

scripting commented 10 months ago

This is what a good renewal looks like in the rsscloud.io log.

image
fmfernandes commented 10 months ago

@scripting Can you add logs around what happens when clicking on the Renew now button? I'm only seeing this:

timestamp log
05:09:43 PM 5:09:43 PM GET feedland.com:80 /renewfeednow https://feedland.com/?feedurl=http%3A%2F%2Fdata.feedland.org%2Ffeeds%2Ffmfernandes.xml ::ffff:127.0.0.1
05:09:43 PM 5:09:43 PM rssCloudRenew: feedUrl == "http://data.feedland.org/feeds/fmfernandes.xml"
05:09:43 PM renewFeedNow: feedRec.feedUrl == http://data.feedland.org/feeds/fmfernandes.xml
scripting commented 10 months ago

right -- it doesn't tell you whether it worked. i'm adding that now.

scripting commented 10 months ago

new feedland and feedlanddatabase with very nice console messages on both server and client saying how the renewal went.

fmfernandes commented 10 months ago

I got this now:

renewFeedNow: feedRec == {
    "notifyResult": {
        "$": {
            "success": "false",
            "msg": "The subscription was cancelled because the call failed when we tested the handler."
        }
    }
}

Still not very hepful, but looking around on https://rsscloud.org/walkthrough/ there's this:

My server does a test call of the handler before adding the subscription, to verify that the handler is functional and can be reached through firewalls and other obstacles. If the call fails, the registration fails.

Which makes sense to test first.

Investigating further:

$ curl -X POST feedland.org/feedupdated
Thanks for the update! ;-)% 
$ curl -X POST feedland.com/feedupdated
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>

I wonder if that redirect has anything to do with subscription process.

scripting commented 10 months ago

It looks like the problem is specific to rsscloud.io.

For example, here's a bluesky feed.

https://feedland.com/?feedurl=https%3A%2F%2Frss.firesky.tv%2F%3Ffilter%3Dfrom%3Ascripting.com

When I click on the Renew button on that page, it works.

Update

But when I posted a new item to the bluesky account the result showed up in neither place.

scripting commented 10 months ago

BTW -- don't trust those docs. It's a spammer's site.

this is where the real docs are.

http://walkthrough.rsscloud.co/

fmfernandes commented 10 months ago

The manual renew button is failing for my blog feed. I get the message:

renewFeedNow: feedRec == {
    "notifyResult": {
        "$": {
            "success": "false",
            "msg": "Error testing notification URL.  The URL returned HTTP status code: 400 - Bad Request."
        }
    }
}

I know this is the response from the server request (fernandomfernandes.wordpress.com:80/?rsscloud=notify) but can we log what we're sending as well?

scripting commented 10 months ago

@fmfernandes -- good idea. I just added code to log what we send when renewing a feed.

feedland and feedlanddatabase versions 0.6.31 and 0.7.19.

It'll be interesting to see what we learn.

scripting commented 10 months ago

This is what the log looks like.

5:24:17 PM rssCloudRenew: theRequest ==

{
    "url": "http://rpc.rsscloud.io:5337/pleaseNotify",
    "method": "POST",
    "followAllRedirects": true,
    "maxRedirects": 5,
    "headers": {
        "Content-Type": "application/x-www-form-urlencoded"
    },
    "body": "domain=feedland.org&port=80&path=%2Ffeedupdated&url1=http%3A%2F%2Fscripting.com%2Frss.xml&protocol=http-post"
}
cagrimmett commented 10 months ago

I updated feedland.com to feedland and feedlanddatabase versions 0.6.31 and 0.7.19.

fmfernandes commented 10 months ago

This is the error I'm getting when trying to click the Renew now link:

rssCloudRenew: theRequest == {
  "url": "http://fernandomfernandes.wordpress.com:80/?rsscloud=notify",
  "method": "POST",
  "followAllRedirects": true,
  "maxRedirects": 5,
  "headers": {
    "Content-Type": "application/x-www-form-urlencoded"
  },
  "body": "domain=feedland.com&port=443&path=%2Ffeedupdated&url1=https%3A%2F%2Ffernandomfernandes.wordpress.com%2Ffeed%2F&protocol=http-post"
}

Which results in the following response:

rssCloudRenew: response from server == {
  "notifyResult": {
    "$": {
      "success": "false",
      "msg": "Error testing notification URL. The URL returned HTTP status code: 400 - Bad Request."
    }
  }
}

Checking the docs (here) and seeing that we're sending the domain parameter, I believe the next step the rssCloud server would take is to make a GET request to the specified URL (/feedupdated), but as you can see, we're sending port 443 as well, which means: GET feedland.com:443/feedupdated which throws the expected 400 - Bad Request

scripting commented 10 months ago

@fmfernandes -- it looks like we're getting somewhere now. :-)

The key routine is getRssCloudOptions in feedland.js.

It determines the domain and port to request callbacks to from config.myDomain.

Looking at the code for config.js:

const myDomain = process.env.APP_DOMAIN || localDev.appDomain;

So -- it seems that myDomain is saying that the app is running at feedland.com:443.

Before we change it, let's find out why it's configured that way. Let's not fix this by breaking something else. ;-)

cagrimmett commented 10 months ago

This is curious... we don't have many logs pre-domain switch, but those we do like https://github.com/scripting/a8c-FeedLand-Support/issues/38#issuecomment-1800145492 show port 80. Did switching to feedland.com change something? 🤔

scripting commented 10 months ago

You can see what's in config.json from your browser.

  1. Go to feedland.com.
  2. Open the JavaScript console.
  3. console.log (jsonStringify (appConsts.serverConfig))

This is what I get.

{
    "flWebsocketEnabled": true,
    "websocketPort": 1462,
    "myDomain": "feedland.com:443",
    "mailSender": "feedland@automattic.com",
    "confirmEmailSubject": "FeedLand confirmation",
    "confirmationExpiresAfter": 3600,
    "flUseTwitterIdentity": false,
    "flEnableNewUsers": true,
    "flBackupOnStartup": false,
    "flNewsProducts": true,
    "flUserFeeds": true,
    "flLikesFeeds": true,
    "urlForFeeds": "https://feedland.com/feeds/",
    "s3PathForFeeds": "/feedland-static/feeds/",
    "s3LikesPath": "/feedland-static/likes/",
    "urlNewsProducts": "/newsproduct?username=",
    "maxRiverItems": 175,
    "maxNewFeedSubscriptions": 250,
    "flUpdateFeedsInBackground": true,
    "minSecsBetwFeedChecks": 15,
    "productName": "FeedLand",
    "productNameForDisplay": "FeedLand",
    "urlServerHomePageSource": "http://scripting.com/code/feedland/home/index.html",
    "urlStarterFeeds": "http://s3.amazonaws.com/scripting.com/publicfolder/feedland/subscriptionLists/starterfeeds.opml"
}
scripting commented 10 months ago

Saw the comments on the Slack channel.

I also searched the code for config.myDomain, and it apparently is only used in one other place.

I'm wary of making a change, considering what happened when I took flDeleted off the river-building query. :smile:

And I found some notes when the change was made, on 5/17/23. Previously the domain and port for rssCloud subs had been hard-coded as feedland.org and port 80.

I don't have any notes on the configuration we're seeing. Let's try making the change, just drop the port from myDomain, like this:

"myDomain": "feedland.com",

And restart and see how rssCloud does.

As it stands now, unless you respond on port 443, then NONE of our requests are going through.

If for some reason that doesn't work, I'll add new config values for these and put it behind us.

scripting commented 10 months ago

BTW, from reviewing appConsts.serverConfig, it's clear that we're going to break published static stuff when we get approval to use the new S3 domain, assuming we do.

    "urlForFeeds": "https://feedland.com/feeds/",
    "s3PathForFeeds": "/feedland-static/feeds/",
    "s3LikesPath": "/feedland-static/likes/",
fmfernandes commented 10 months ago

I'm pretty sure removing 443 from myDomain will break something. I believe we did that change in a call between me, Dave and Chris but I don't remember exactly why and couldn't find commits around it because that variable is an env var. We can try that and see what happens.

Is there a possibility to create new config variables that allows us to specify port and domain for the rssCloud request?

scripting commented 10 months ago

OK, I'll add this --

rssCloudNotifyDomain

It will default to myDomain, but if you set it separately it'll be only used for this purpose, for rssCloud notifications.

Make sense?

fmfernandes commented 10 months ago

Can we also have one variable for port?

scripting commented 10 months ago

The port is included in the domain, as with config.myDomain.

If it's meant to be 80, you can leave it out.

There's a new version of feedland, 0.7.19.

just add

rssCloudNotifyDomain: "feedland.com"

and it should work.

fingers crossed

praise murphy! ;-)

scripting commented 10 months ago

To nail it down, the new code is here.

https://github.com/scripting/feedland/blob/main/feedland.js#L861

fmfernandes commented 10 months ago

I tried it again:

renewFeedNow: feedRec == {
    "notifyResult": {
        "$": {
            "success": "true",
            "msg": "Registration successful."
        }
    }
}

Body of the request sent:

"body": "domain=feedland.com&port=80&path=%2Ffeedupdated&url1=https%3A%2F%2Ffernandomfernandes.wordpress.com%2Ffeed%2F&protocol=http-post"

Got the challenge back and server registered successfully 😎


Tried publishing a post, and almost instantly got the ping back. My feed also got to the top of the feed page. I'll try now with different rssCloud servers.

scripting commented 10 months ago

I see it worked for your blog but..

I tried the same with scripting.com, renewed the subscription and then posted an item.

the new item showed up on feedland.org but not feedland.com.

fmfernandes commented 10 months ago

It also worked for my own feed (feedland.com/feeds/fernando.xml)

renewFeedNow: feedRec == {
    "notifyResult": {
        "$": {
            "success": "true",
            "msg": "Thanks for the registration. It worked. When the resource updates we'll notify you. Don't forget to re-register after 24 hours, your subscription will expire in 25. Keep on truckin!"
        }
    }
}
fmfernandes commented 10 months ago

I can't think of a reason for scripting.com not working. Both use the same rssCloud server and I can see in the logs that the registration was successful:

subscribe message timestamp duration
Subscribe Subscriber feedland.org:80 requests notification when the resource changes via http: protocol. 2:23PM 0.083
Subscribe Subscriber feedland.com:80 requests notification when the resource changes via http: protocol. 2:22PM 0.098
Subscribe Subscriber feedland.org:80 requests notification when the resource changes via http: protocol. 2:23PM 0.083
Subscribe Subscriber feedland.com:80 requests notification when the resource changes via http: protocol. 2:22PM 0.098

But can't see any Notify in the logs for scripting.com/rss.xml going out either to .org or .com.

fmfernandes commented 10 months ago

I can see now NotifyFailed results in the cloud server logs. Checking our application logs we're not even getting a request to /feedupdated which means that the rssCloud server is not following the http -> https redirect. Not sure how we can handle that 🙃

Example requests:

$ curl -X POST feedland.com/feedupdatedwill return a 301 response. $ curl -X POST -L feedland.com/feedupdated will return a sucessful response.

scripting commented 10 months ago

Is that the issue? They notify us on http, and we redirect back to https, and we never hear from them again? If that’s the case it’s a bug in the cloud server. You can contact him, post an issue in his repo. Andrew is friendly and capable. I’m sure he’d help.

fmfernandes commented 10 months ago

BTW, from reviewing appConsts.serverConfig, it's clear that we're going to break published static stuff when we get approval to use the new S3 domain, assuming we do.

    "urlForFeeds": "https://feedland.com/feeds/",
    "s3PathForFeeds": "/feedland-static/feeds/",
    "s3LikesPath": "/feedland-static/likes/",

Hey @scripting, since we now got approval, how should we set up those variables? I guess the only one that needs changing is the urlForFeeds?

scripting commented 10 months ago

@fmfernandes -- you would be the expert on this, but yes, let's get this correct so we can start testing and take this off the todo list! ;-)