j0k3r / graby

Graby helps you extract article content from web pages
MIT License
363 stars 73 forks source link

Fix cookies on multiple page #290

Open j0k3r opened 2 years ago

j0k3r commented 2 years ago

It was the case for golem.de, the cookie wasn't properly send to the next page (might be a bug in the cookie jar not properly retrieving previous defined cookies).

Fix https://github.com/j0k3r/graby-site-config/issues/48

coveralls commented 2 years ago

Coverage Status

Coverage increased (+0.5%) to 95.611% when pulling e3504a57b05c2aad750cdb13f60578cbe4d54800 on fix/cookie-multiple-pages into a7aecceded6aa8f2ced101ec2e04b14d928fb51d on master.

Kdecherf commented 2 years ago

Didn't check in depth but we may need to add a test to ensure that we don't leak cookies if the next page is not on the same domain. What do you think?

j0k3r commented 2 years ago

That's a good question, this shouldn't happen so much. Should we add the cookie in the cookie jar instead so it'll be checked later in the foreach?

j0k3r commented 2 years ago

I've checked and cookies we are re-injecting are only those which are defined in site config, so there shouldn't really contains sensitive data? Otherwise I don't know how to fix the leak because at the point where we should check it, we don't know the host associated to cookies which are in the headers.