Open NoamGaash opened 1 year ago
Hi All,
I think this is the most appropriate place I could think of to post one of my query.
I recently setup a HAR in my tests that captures all network requests in a HAR file when update flag is true and then use this HAR file to serve the mock API responses when update flag is false. Few things I observed that are kind of blocking me to actually leverage HAR functionality. Please see below -
Workaround : To fix this I manually added _file key and assigned a value test.json to it. I created test.json in my har directory and added the response of this API. How I got the response? I called this API in postman to get the full response.
www.example.com/api/v2/collect/?start=1695901908803 -- Saved in HAR file when recorded(when update flag true) www.example.com/api/v2/collect/?start=1695901912345 -- Saved when test is run from HAR(when update flag false) then application calls the same API but with different value of start time. Due to this when playwright mocks this API call it doesn't find the exact match in HAR file and the mocking fails by default.
For point 2 I wanted to know if there is anything I can do ensure that even though the there is a change in start parameter playwright mocks the API because for testing purpose I don't care about the result difference between the two timestamps. I just want playwright to use the same mock again.
One lengthy method is that I mock such Apis separately with glob pattern and fulfill it with the whatever result I want. But doing so I won't be able to leverage the HAR power and this might become cumbersome for other pages also where such type of APIs are called.
Any thoughts/guidance highly appreciated.. Thanks!!
@fs-projects in my projects I use things like:
await page.routeFromHAR( /* ........... */)
await page.route(/v2\/collect/, (route) => {
const url = route.request().url().replace(/\d+/, "1695901908803")
route.fallback({url})
})
But I feel terrible doing it. its not maintainable at all.
I'm trying to implement the smart matching algorithm into Playwright, but I'm facing some technical difficulties regarding the protocol.yml
file.
If anyone want's to join my efforts, I would love to have an online meeting and get some feedback
I'm available on noam.gaash@applitools.com
@NoamGaash I was on leave for some days. Thanks for sharing this workaround. I tweaked it as per my app and it worked. I know it's not maintainable but great help. Could you please let me know if my understanding is correct -
await page.routeFromHAR( /* ........... */)
-- Will serve the request from HAR file instead of actual network calls.
const url = route.request().url().replace(/\d+/, "1695901908803")
route.fallback({url})
})
--Ensures all matching urls are replaced as specified and then searched and served from the HAR file? Let me know in detail if possible.
Also I am happy to connect with you to join your efforts. I will shoot out an email to you and you can let me know if we can proceed.
sure! thanks.
Playwright interceptors (page.route
, page.routeFromHar
) are implemented with a stack mechanism - the first route function you register will be the last to handle the request.
for example, consider the following URL:
www.example.com/api/v2/collect/?start=1695901912345
and the following code:
await page.routeFromHAR( /* ........... */)
await page.route(/v2\/collect/, (route) => {
const url = route.request().url().replace(/\d+/, "1234")
route.fallback({url})
})
page.route
.route.fallback
will make a request to www.example.com/api/v2/collect/?start=1234
to proceed to the next interceptorpage.routeFromHAR
Hi everyone! I've made a small npm-package, that solve that issue. With the playwright-advanced-har package, you'll be able to:
and much more
please don't hesitate to open an issue with any question, and I'll do my best to help any use case
@fs-projects your use case can be solved using:
import { test, defaultMatcher } from "playwright-advanced-har";
test("ignore search params", async ({ page, advancedRouteFromHAR }) => {
await advancedRouteFromHAR("tests/har/different-search-params.har", {
matcher: (request, entry) => {
const reqUrl = new URL(request.url());
const entryUrl = new URL(entry.request.url);
reqUrl.search = "";
entryUrl.search = "";
if (
reqUrl.toString() === entryUrl.toString() &&
request.method() === entry.request.method &&
request.postData() == entry.request.postData?.text
) {
return 1;
}
return -1;
},
});
await page.goto("www.example.com/api/v2/collect/?start=" + Date.now());
});
I've just had a difficulty with routeFromHAR where we have a cache busting parameter in our API url so the har file does not match.
/api?a=3648574675856 for example where a changes on every api call.
I have basically made my own version of routeFromHAR to match the URL and Method but pass in a santitseUrl function which is called on the route.request().url() and the har.log.entries[].request.url values before matching them.
sanitiseUrl(url) url.replace(/([?&]a=)\d+/. "$1NNNNNNNNN")
You might consider adding such a parameter here to advanced options if you are planning improvements.
@bcowgill I believe this snippet will solve your use case:
import { test, customMatcher } from "playwright-advanced-har";
const fixUrl = url => url.replace(/([?&]a=)\d+/. "$1NNNNNNNNN")
test("ignore `a` get argument", async ({ page, advancedRouteFromHAR }) => {
await advancedRouteFromHAR("tests/har/my-file.har", {
matcher: customMatcher({
urlComparator(a, b) {
return fixUrl(a) === fixUrl(b);
},
}),
});
await page.goto("/api?a=3648574675856");
});
@bcowgill I believe this snippet will solve your use case:
import { test, customMatcher } from "playwright-advanced-har"; Hey, thanks, how about this other issue, will it solve that?
routeFromHAR header Access-Control-Allow-Origin should be configurable need to replace test env domain with localhost domain to play back captures locally https://github.com/microsoft/playwright/issues/28447
@bcowgill actually, I'm considering adding that exact feature https://github.com/NoamGaash/playwright-advanced-har/pull/6/files It will let you alter the entry found in the HAR. WDYT? Please open an issue for that, I have several ideas but I want to be backward compatible as much as possible, therefore I release new features very cautiously
Hi everyone! I've made a small npm-package, that solve that issue. With the playwright-advanced-har package, you'll be able to:
- Ignore port numbers
- Ignore search params
- Shuffle response order
and much more
please don't hesitate to open an issue with any question, and I'll do my best to help any use case
@fs-projects your use case can be solved using:
import { test, defaultMatcher } from "playwright-advanced-har"; test("ignore search params", async ({ page, advancedRouteFromHAR }) => { await advancedRouteFromHAR("tests/har/different-search-params.har", { matcher: (request, entry) => { const reqUrl = new URL(request.url()); const entryUrl = new URL(entry.request.url); reqUrl.search = ""; entryUrl.search = ""; if ( reqUrl.toString() === entryUrl.toString() && request.method() === entry.request.method && request.postData() == entry.request.postData?.text ) { return 1; } return -1; }, }); await page.goto("www.example.com/api/v2/collect/?start=" + Date.now()); });
Thank you very much @NoamGaash
Is there any update about this? When will it be part of Playwright?
@bcowgill I believe this snippet will solve your use case:
import { test, customMatcher } from "playwright-advanced-har";
Hey, thanks, how about this other issue, will it solve that?
routeFromHAR header Access-Control-Allow-Origin should be configurable need to replace test env domain with localhost domain to play back captures locally #28447
Hi, version 1.3.1 now supports intercepting the responses from the HAR file, so it solves your use case as well :)
I'm also struggling to integrate the routeFromHAR
functionality nicely into our project. Basically, everything works but update: true
starts to clutter our git repository. Let me explain why:
What we do is the following:
await page.routeFromHAR(harFile, {
update: options.update,
updateContent: 'attach',
notFound: 'fallback',
url: NETWORK_URL_REGEX_FOR_MOCK,
updateMode: 'minimal',
});
Now this creates all the HAR files and creates files for the responses. For example you will find the following in a HAR file:
"content": {
"size": -1,
"mimeType": "application/json",
"_file": "0eeb62f9e778c07885a5323f4938ccb30969bdb0.json"
},
The filename is based on the sha1 of the content of the file (and the content is the response of our backend). Now my problem is, that every response from our backend contains a meta field in the JSON like the following
"meta": { "total": 1, "serverTime": "2024-04-26T11:11:46.789Z"
Now the hash is always different because of the serverTime
entry. Therefore every run with update: true
creates dozens of new files that are only different because of serverTime
.
I tried to work around this problem with the following:
await page.routeFromHAR(harFile, {
update: options.update,
updateContent: 'attach',
notFound: 'fallback',
url: NETWORK_URL_REGEX_FOR_MOCK,
updateMode: 'minimal',
});
await this._page.route(API_URL_REGEX, async (route) => {
const response = await route.fetch();
const json = await response.json();
if (json.meta?.serverTime) {
json.meta.serverTime = '2024-01-01T00:00:00.000Z';
}
await route.fulfill({ response, json });
});
Now it seems like the changed response does not end up in the HAR file. Am I doing something wrong? I'm a little bit lost because I'm not sure if I try to do something completely crazy or if it's a valid case that I try to solve. Maybe someone has an idea. Maybe I can solve my problem with playwright-advanced-har but I'm not sure how. Basically, I want the normal routeFromHAR
functionality but it should not create unnecessary files.
Maybe we could have some possibility to change the response before it's handed over to the HAR generation?
@tschoartschi I'm trying to understand the use case - why would you update your har file often?
Regarding intercepting the request using page.route
- it won't change the content of the HAR file (see #29190).
Maybe it's by design - it can be convenient to rely on the fact that the har file reflects the real network traffic occurred.
updateContent: "embed"
?@noamGaash thanks for the fast response 🙂
Our app is pretty data-intensive and makes lots of requests. One of our most used examples makes 89 requests to our backend. Mocking each of these 89 requests manually is tedious this is why I thought about using HAR files.
We want to commit the HAR files to our git repo so that every dev has the same mocking data. Also, CI and QA-checks should use those HAR files.
If I have two tests for example:
const update = true;
test('override a network call', async ({ page, context }) => {
await page.routeFromHAR('har1.har', {
update: update,
updateContent: 'attach',
notFound: 'fallback',
url: NETWORK_URL_REGEX_FOR_MOCK,
updateMode: 'minimal',
});
// run the test
});
test('do some other test', async ({ page, context }) => {
await page.routeFromHAR('har2.har', {
update: update,
updateContent: 'attach',
notFound: 'fallback',
url: NETWORK_URL_REGEX_FOR_MOCK,
updateMode: 'minimal',
});
// run the other test test
});
I end up with 89 * 2 = 178 files. Although 89 would be enough. This adds up the more tests I have 🤔 and quickly I have thousands of files...
I think using updateContent: "embed"
only hides the problem because then every HAR file becomes unnecessarily big.
I also thought about cleaning up in teardown but essentially every file is referenced in some HAR file. Sure I could create a complicated clean-up logic that tries to find files with the same content and change the reference in the HAR files and then delete unused files. But that sounds like a lot of hassle for that that the default solution almost does what we need 🙂
Meanwhile, I think it might be better if I just wrote my own capture and mock logic for our backend. Based on page.route
.
Let me know if I explained my problem properly now 🙂 if anything is unclear I can try to explain it even in more detail
@NoamGaash we have now implemented our own logic. It's based on page.route
, similar to what they show in the docs here: https://playwright.dev/docs/network#modify-requests
The idea is to create JSON files if we want to update (similar to the HAR file generation), and when we do not want to update we read those JSONs.
This gives us much more flexibility and eases writing tests a lot for us 🙂
@tschoartschi interesting!
If you'll change your mind, I'm always open for contributions for the advancedRouteFromHAR
fixture.
I think that making the postProcess
function change the actual saved file can be a nice feature
After struggling with HAR similar to @tschoartschi, we've also ended up with own solution based on pure page.route
.
I think HAR is not the best format, when you need fine-grained control of network in e2e tests.
Shared the solution to open source playwright-network-cache.
Background / Use cases
The
routeFromHAR
functionality encapsulates two behaviors. Whenupdate
istrue
, it behaves as a custom network recorder, while whenupdate
isfalse
(default) it serves as a network communication mock. I would like to suggest several improvements:saving minimal data
today, each HAR entry contains a lot of unnecessary data - timing, HTTP version, request headers, and more. while this data might be useful for analysis, most of it is not being used for network traffic replay. Despite that omitting the unnecessary fields would make a deviation from the formal HAR specifications, I believe it is worth it in terms of clarity, bundle size, concise git differences, output predictability, and maintainability.
example
before:
after:
excludeUrl
today, the
page.routeFromHAR
method receivesURL
option that defines which URLs should be included in the resulting HAR file. I suggest we should haveexcludeUrl
property. the property would allow users to exclude specific path(s) from recording.example
before:
after
smart matching algorithm
Playwright docs state that:
Would it be possible to grant users more control over he match algorithm?
example
summery
Thank you for considering my suggestions! Please note that my first suggestion is a breaking change - it would omit data from the HAR file, and there is a chance that some users rely on the timing/headers data in there. I would love to hear your opinion and get some feedback before implementing any changes or submitting PRs.
Thank you for maintaining Playwright! I love this tool. Noam