internetarchive / brozzler

brozzler - distributed browser-based web crawler
Apache License 2.0
669 stars 97 forks source link

Random SAML Authentification #228

Open mmarchi21 opened 3 years ago

mmarchi21 commented 3 years ago

I've been using Brozzler to archive a SAML-protected site, which I've been able to enter through using user_agent to get the site to accept regular form authentication. However, I frequently encounter an issue where, after passing the form authentication, Brozzler archives the actual authentication page that asks for username and password instead of the page "under" the authentication request. Essentially, after the site is officially entered and SAML authentication of the home page is completed, some of the following pages have this issue.

This occurs in about 1/6 pages, and I'm unsure if this is due to site settings or Brozzler itself. Any tips / advice would be very much appreciated.