FIxing HTTP 406 error while runing the test suite (npm test)

Jacobojijo commented 1 month ago

This is to fix issue #95

Solution

I modified the scraping.js file to use more browser-like User-Agent and Accept headers. Here are the key changes:

Added constants for user agent and accept header:

const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36';
const acceptHeader = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';

* Created a `getWithHeaders` function to make requests with these headers:

```javascript
function getWithHeaders(url) {
    return preq.get({
        uri: url,
        headers: {
            'User-Agent': userAgent,
            'Accept': acceptHeader
        }
    });
}

Updated all preq.get() calls to use getWithHeaders() instead.
Modified the meta() function calls to include the headers:
```
return meta({
    uri: url,
    headers: {
        'User-Agent': userAgent,
        'Accept': acceptHeader
    }
})
```
This should improve the reliability of the test suite, especially when dealing with websites that have stricter requirements for incoming requests.

Jacobojijo commented 1 month ago

Hello @mvolz, I have made the requested changes to the pull request. You can check and if there is still anything I should do, please inform me

Jacobojijo commented 1 month ago

Hello @mvolz would you please do the PR review for the changes made ?

Jacobojijo commented 1 month ago

@mvolz, can you review my PR? I also think a styling lint should be added to help in other future changes. I'm thinking of introducing one to this project. What do you think?

Jacobojijo commented 1 month ago

@mvolz, I guess this PR is done

wikimedia / html-metadata

FIxing HTTP 406 error while runing the test suite (npm test) #96

Solution