uBlockOrigin / uAssets

Resources for uBlock Origin, uMatrix: static filter lists, ready-to-use rulesets, etc.
GNU General Public License v3.0
3.58k stars 691 forks source link

Address ads on `tokopedia.com` #23587

Closed partingscientist closed 1 week ago

partingscientist commented 1 week ago

URL(s) where the issue occurs

tokopedia.com

Describe the issue

To fully block promoted products and stores on tokopedia.com, especially on mobile site, trusted filters are required. This pull request is an attempt to address those as extensively as possible.

This pull request covers

  1. promoted products on home page,
  2. promoted products and stores on search page (/search),
  3. promoted products on product page (/p),
  4. promoted products and stores on find page (/find),
  5. promoted products on cart page (/cart), and
  6. promoted products and store cache on search page (/search)

Screenshot(s)

The following is used to emphasise targeted elements.

tokopedia.com##span:has-text(/^Ad$/):style(width: 24px !important; height: 24px !important; font-size: 16px !important; background-color: red !important;)
tokopedia.com##p:has-text(/^Ad$/):style(color: white !important; font-size: 16px !important; background-color: red !important;)
Screenshots: 1. Home page (/) ![home-desktop](https://github.com/uBlockOrigin/uAssets/assets/115052854/7b686b24-2116-44a0-bc5c-fbcef0894e99) ![home-mobile](https://github.com/uBlockOrigin/uAssets/assets/115052854/af4b724b-a2ff-48d6-a9ca-b3b4c54e099b) 2. Search page (/search) ![search-desktop](https://github.com/uBlockOrigin/uAssets/assets/115052854/f7911d82-b7db-4d54-a24d-992af16d8ce4) ![search-mobile](https://github.com/uBlockOrigin/uAssets/assets/115052854/13aace66-9da7-4de8-bf34-f373896a0e95) 3. Product page (/p) ![product-desktop](https://github.com/uBlockOrigin/uAssets/assets/115052854/2843dd05-a415-421c-b9e9-81dd465ecf90) ![product-mobile](https://github.com/uBlockOrigin/uAssets/assets/115052854/063e5c85-4ae4-4deb-ac15-21fcb0290f48) 4. Find page (/find) ![find-desktop](https://github.com/uBlockOrigin/uAssets/assets/115052854/a1e4e149-679e-4186-a845-db4792ac73f6) ![find-mobile](https://github.com/uBlockOrigin/uAssets/assets/115052854/df6fd33d-746a-4658-b66f-404fcf279a93) 5. Cart page (/cart) ![cart-desktop](https://github.com/uBlockOrigin/uAssets/assets/115052854/e9333023-da7d-4752-adcb-1e5340785460) ![cart-mobile](https://github.com/uBlockOrigin/uAssets/assets/115052854/55e1a8a1-db59-4c4e-8cad-353858362912)

Versions

Settings

Notes

Rough paper: The first part of the proposed solution consists of three pruning filters used to empty the contents of an array inside a JSON response. The first filter is for the home page (/) and the search page (/search); the second filter is for the product page (/p); the third filter is for the find page (/find). The second part consists of seven regexes needed to remove response objects by checking the value of its property. In order, each filter is used to handle - desktop search page (/search) carousel, - mobile search page (/search) carousel, - mobile product page (/p) carousel, - mobile product page (/p) recommendation, - desktop cart page (/cart) recommendation, - mobile cart page (/cart) carousel, and - mobile cart page (/cart) recommendation The third part consists of a HTML filter used to remove promoted products data served directly when accessing some pages directly. This will force the site to fetch the needed data via POST request instead, which will be covered by the first three filters.
Steps to reproduce: For ease of investigation, you might find it beneficial to switch the site language into English using the language switcher on the footer of the site. Not every string is translated unfortunately, but it should be helpful. 1. Promoted products on home page (/) Note: requires being logged in with an account. - Open `tokopedia.com`. - Scroll below far enough until you can see a product section titled `For You`. - There should be promoted products marked with `Ad` on the lower right of the item. 2. Promoted products on search page (/search) - Open `tokopedia.com`. - On the search bar at the top of the page search for `ringke`. - There should be promoted store on the top of the page and promoted products marked with `Ad` on the lower right of the item. 3. Promoted products on product page (/p) - Open `tokopedia.com`. - Find the keyword `Kategori Pilihan` on the page and click on one of the product choices presented below the keyword. - There should be promoted store on the top of the page and promoted products below it, both marked with a megaphone symbol on the lower right of the listings. 4. Promoted products on find page (/find) - Open `tokopedia.com`. - Find the keyword `Lagi Trending` on the page and click on one of the product choices presented below the keyword. - There should be promoted products marked with `Ad` on the lower right of the item. 5. Promoted products on cart page (/cart) Note: requires being logged in with an account. - Open `tokopedia.com`. - Click the cart button on the top right corner of the page. - There should be promoted products marked with `Ad` on the lower right of the item. 6. Promoted products cache on the site - Open `https://www.tokopedia.com/search?q=ringke` directly. - There should be promoted store on the top of the page and promoted products marked with `Ad` on the lower right of the item. - If everything is done correctly, you should be able to find `displayAdsV3` using your browser inspector that indicates the promoted stores and products are served directly in the HTML instead of being from a POST request, which is the case in all of the previously mentioned possible cases.
D4niloMR commented 1 week ago

Can it use json-prune-fetch-response instead of trusted-replace-fetch-response and replace ? Also please add links to where this ads are found.


I can reproduce ads on search and this is working on my end:

tokopedia.com##+js(json-prune-fetch-response, 0.data.displayAdsV3, , propsToMatch, url:/graphql/Topads)
partingscientist commented 1 week ago

Also please add links to where this ads are found.

It will be added later, for the scope of this PR is quite extensive. It will take a while for me to double-check my notes and make sure I don't miss anything, hence the PR being a draft.

Can it use json-prune-fetch-response instead?

Regarding the first regex, your suggestion will break some pages (which will be specified later upon the completion of the PR draft) because some properties needs to exist for some pages. The point of replace and trusted-replace-fetch-response is to empty the contents of an array inside the JSON response instead, thus maintaining the existence of those properties.

The second and third regex are used because we're checking the value of a property, so trusted filters are needed, as far as I know.

stephenhawk8054 commented 1 week ago

For the first regex, trusted filter is required because deleting displayAdsV3 using json-prune will break the product page (/p).

Can you screenshot what is broken? I tried the above filter and followed the Promoted products on product page (/product) steps but the page looks the same as without the filter for me.

partingscientist commented 1 week ago

Wait, did I write (/product) for the path instead of (/p)? I'll correct that.

Make sure you have the HTML filter on My Filters to make sure the promoted products are served via a POST request instead of directly in the page script.

If you add tokopedia.com##+js(json-prune, 0.data.displayAdsV3) and follow the STR for the product page (/p), you should see this which indicates no products are found.

Screenshot: ![broken-product-page](https://github.com/uBlockOrigin/uAssets/assets/115052854/758e7809-3739-443a-a025-0dffd059ff28)
stephenhawk8054 commented 1 week ago

The above filter is json-prune-fetch-response so you can prune directly only on the chosen fetch response.

##+js(json-prune, 0.data.displayAdsV3) does indeed cause breakage but ##+js(json-prune-fetch-response, 0.data.displayAdsV3, , propsToMatch, url:/graphql/Topads) seems fine for me. Can you check again?

partingscientist commented 1 week ago

Can you screenshot how it looks like on your end? It does not work on my end because that propsToMatch should not match anything (A/B testing?). A promoted product on that specific page should have a megaphone symbol on its lower right.

stephenhawk8054 commented 1 week ago

I mean the above filter is for search ads but won't cause breakages on product pages. It's not meant to filter the ads on product, but I think the same concept can be used for product pages as well.

stephenhawk8054 commented 1 week ago

For product ads, does this work on your side?

tokopedia.com##+js(json-prune-fetch-response, 0.data.displayAdsV3.data.[-].clickTrackUrl, , propsToMatch, /graphql/SearchProductQuery)
partingscientist commented 1 week ago

I mean the above filter is for search ads but won't cause breakages on product pages.

Oh yeah of course, I originally preferred a general approach for all of them, because each page has different props, desktop and mobile each has a different props, and I might have missed some pages. Is it preferred to just enumerate each possible request and create a limited filter for each of them separately (combining if possible)? There would be a lesser chance of breakage obviously compared to my original approach.

Regardless, I'll convert this to a draft again, for I have a feeling that approach is preferred instead of using the first regex filter.

For product ads, does this work on your side?

Yeah that should work.

stephenhawk8054 commented 1 week ago

I think it's better to separate the cases to reduce the complexity for maintaining and mismatch / breakages. If combining them is simple enough, then I think it's OK. Also it's more preferable to do with non-trusted filters first if we can find some ways with them.

With new improvement of json-prune, I think you can use [-] / {-} in this case. For search, product and find, do these work on your side?

! search + product
tokopedia.com##+js(json-prune, 0.data.displayAdsV3.data.[-].__typename)

! https://www.tokopedia.com/find
tokopedia.com##+js(json-prune, 0.data.topads.data.[-].productClickURL)

The only one I can't check is the cart one, I guess this filter is towards it?

||tokopedia.com/graphql/$xhr,replace=/\{"category_id"(?:(?!"ads":\{"id":"").)+?"ads":\{"id":"\d+".+?"__typename":"ProductCarouselV2"\},?//g

At first glance, I think it might be similar to the above cases. Can you check again or share its JSON?


I think this filter is for For You section? I still see the ads with it

||tokopedia.com/graphql/$xhr,replace=/\{"(?:productS|s)lashedPrice"(?:(?!"isTopads":false).)+?"isTopads":true.+?"__typename":"recommendationItem"\},?//g

Is the response from this URL https://gql.tokopedia.com/graphql/RecommendationFeedQuery? If it is, the regex doesn't match with the response on my side: https://file.lekture.top/json/tokopedia-J8B19zLg.json

Can you share which URL were you focusing on or the JSON on your side?

partingscientist commented 1 week ago

I think you can use [-] / {-} in this case.

The first regex is meant to empty the contents of an array value of a property, so these should work. I've replaced the first regex with its equivalents using json-prune-fetch-response.

The second regex is for product carousels visible on the search page (/search). You can follow the STR for the page and look for the keyword Beli Lokal on the page (I refer to it as a carousel because the segment is scrollable sideways on the mobile version of the site). Here's an example JSON request from /graphql/InspirationCarousel.

https://pastes.dev/0ToA4lK1L5

The only difference between ad and non-ad products is the fact that all subproperties of ads are non-empty strings instead of empty strings. As far as I know, I need regex for this. The current solution that I have is ugly (negative lookbehind), feel free to improve it.

The third regex is for the cart page (/cart). Here's an example JSON request from /graphql/RecomWidget.

https://pastes.dev/dYVHMvwDjo.

The only difference between ad and non-ad products is the value of a subproperty isTopads being true instead of false. Again, I think I need regex for it as far as I know.

stephenhawk8054 commented 1 week ago

@partingscientist Does json-prune work on your side? If you use json-prune-fetch-resonse, it's better to narrow down which exact fetch URL you want to target with propsToMatch. The more specific and less URLs being targeted, the better performance.

partingscientist commented 1 week ago

That should work.

I ended up needing negative lookbehind for the third regex to cover mobile carousels on the product page (/p). Here's a JSON test case (/graphql/ProductRecommendationQuery) if you want to try to improve it.

https://pastes.dev/fNPR3NGAB7

stephenhawk8054 commented 1 week ago

Yeah, the others need regex. Since these are large regex, can you specify exact path for the filters? Instead of tokopedia.com/graphql/, using something like /graphql/InspirationCarousel would be better.

stephenhawk8054 commented 1 week ago

I think these 3

||tokopedia.com/graphql/productRecommendationWidget$xhr,replace=/\{"id":\d{9,11}(?:(?!"isTopads":false).)+?"isTopads":true.+?"__typename":"recommendationItem"\},?//g
||tokopedia.com/graphql/ProductRecommendationQuery$xhr,replace=/\{"id":\d{9,11}(?:(?!"isTopads":false).)+?"isTopads":true.+?"__typename":"recommendationItem"\},?//g
! Promoted products on cart page (/cart)
||tokopedia.com/graphql/productRecommendation|$xhr,replace=/\{"id":\d{9,11}(?:(?!"isTopads":false).)+?"isTopads":true.+?"__typename":"recommendationItem"\},?//g

can be combined to one

||tokopedia.com/graphql/productRecommendation$xhr,replace=/\{"id":\d{9,11}(?:(?!"isTopads":false).)+?"isTopads":true.+?"__typename":"recommendationItem"\},?//g

?

partingscientist commented 1 week ago

I originally split them for clarity, but it should be possible to combine them.

For trusted-replace-fetch-response, using /\/graphql/(?:P|p)roductRecommendation/ for propsToMatch should work, no? Or is it exact match only?

stephenhawk8054 commented 1 week ago

trusted-replace does not need propsToMatch, you just need to put the link at the last argument

You can use /\/graphql\/productRecommendation/i

stephenhawk8054 commented 1 week ago

I think it's good now

stephenhawk8054 commented 1 week ago

Can I merge it?

partingscientist commented 1 week ago

I need some time, I want to make sure I don't miss any corner cases. I'll let you know.

partingscientist commented 1 week ago

Well, I do found one.

https://regex101.com/r/eCo6jn/1

! Promoted products on mobile carousels of product page (/p) and cart page (/cart)
||tokopedia.com/graphql/productRecommendation$xhr,replace=/\{"id":\d{9,11}(?:(?!"isTopads":false).)+?"isTopads":true.+?"__typename":"recommendationItem"\},?//g

I can either

  1. Force the last comma to be required (,? => ,), which means the last item in a carousel will be missed if it is an ad, or
  2. Split the above filter into
    ||tokopedia.com/graphql/productRecommendation$xhr,replace=/\{"id":\d{9,11}(?:(?!"isTopads":false).)+?"isTopads":true.+?"__typename":"recommendationItem"\},//g
    ||tokopedia.com/graphql/productRecommendation$xhr,replace=/,\{"id":\d{9,11}(?:(?!"isTopads":false).)+?"isTopads":true(?:(?!"__typename":"recommendationItem").)+?"__typename":"recommendationItem"\}(?=\])//g

    which does not look pretty.

Any preference?

Sidenote: Why would an e-commerce site place a promoted product at the very back instead of the very front? 🤷

stephenhawk8054 commented 1 week ago

How about this?

||tokopedia.com/graphql/productRecommendation$xhr,replace=/\{"id":\d{9,11}(?:(?!"isTopads":false).)+?"isTopads":true.+?"__typename":"recommendationItem"\}(,?)/{}\$1/g
partingscientist commented 1 week ago

That breaks the page. Unfortunately, we cannot empty the object; it has to be deleted.

stephenhawk8054 commented 1 week ago

Yeah then I think it's unavoidable.

partingscientist commented 1 week ago

I don't think I have anything else to add. Fingers crossed that should be everything.

stephenhawk8054 commented 1 week ago

Ok. I'll merge it.

partingscientist commented 6 days ago

Somehow I forgot about this.

@stephenhawk8054 Can you replace this

https://github.com/uBlockOrigin/uAssets/blob/8fa22c99cd30f3c19cf0772d76006f7663370871/filters/filters-2024.txt#L1582-L1584

with the following?

tokopedia.com##+js(json-prune, [].data.displayAdsV3.data.[-].__typename)
tokopedia.com##+js(json-prune, [].data.TopAdsProducts.data.[-].__typename)
tokopedia.com##+js(json-prune, [].data.topads.data.[-].__typename)

In some rare instances, the promoted products may be served using multiple array elements.