sauladam / shipment-tracker

Parses tracking information for several carriers, like UPS, USPS, DHL and GLS by simply scraping the data. No need for any kind of API access.
93 stars 32 forks source link

Crackdowns: USPS, DHL #28

Open andlabs opened 3 years ago

andlabs commented 3 years ago

I might as well be the one to open this issue here. I've mostly been using this tool's source as a reference for my own slightly different but similar thing that just prints the raw status information of all my packages in a single table, but it has come in handy regardless, especially since this is one of the few projects left, open-source or not, that does desktop package tracking of any form. I guess everyone is used to fancy HTML-message email spam and text message spam now? :/

USPS recently (either June or July) added Akamai anti-scrape protection to their website, meaning that some complex JavaScript needs to be run to generate a cookie that needs to be sent with every future request. There are some people dealing with similar such protections with regards to scraping catalogs off online marketplaces, but they haven't been willing to make an easy-to-consume list of instructions on how to bypass this, instead insisting you spend months doing it yourself. Not out of fear of Akamai's wrath — solely out of elitism. I already was scraping HTML for this, so I just switched to using a WebDriver instead.

DHL likely in the past few weeks now just responds to everything with

<!DOCTYPE html>  <html lang="en" dir="ltr">   <head>  <meta charset="utf-8"/>  <title>Your tracking attempt has been blocked</title>    <meta http-equiv="X-UA-Compatible" content="IE=edge"/>      <meta name="language" content="en"/>  <meta name="region" content="global"/>  <meta name="robots" content="noindex,nofollow"/>   <body>     <header>         <h1>             Your tracking attempt has been blocked         </h1>     </header>      <section>         <p>             Please note that the tracking status information on this website is intended for human consumption via the website only. It is not intended to be used for integration with your systems. Automated extraction of information by bots, website scraping etc. is prohibited.          </p>         <p>             If you require direct system integration, kindly visit <a href="https://developer.dhl/">https://developer.dhl</a> for API access options.         </p>     </section> </body> </html>

unless it uses the exact chain of events their website uses, which includes more of that Akamai nonsense but goes further.

While it appears they only require a company name and not proof that that company actually makes regular shipments (*cough* UPS *cough*) to get an API key, it still sucks, especially since this appears to be the tracking service that inspired this entire project in the first place. I haven't figured it out yet, nor do I think I will (I'll probably just get an API key unless you figure it out first; I have stuff shipped using it very sparsely anyway).

I'm surprised API keys have not completely killed open source yet (but proprietary package trackers are hard to come by nowadays too, even if they don't give me the simple CLI output I prefer). I'm also amazed no one has questioned whether tracking data can be owned or that extraction of such data can be controlled even for personal use by a human, though I'm sure they'd just do all this regardless.

andlabs commented 2 years ago

Sorry, when I said DHL I should have said DHL Express. I have not tried using regular DHL.