ispras / web-scraper-chrome-extension

Web data extraction tool implemented as chrome extension
GNU Lesser General Public License v3.0
214 stars 68 forks source link

Selector is not working and selector panel not showing in the web page #110

Open Tony20221 opened 2 years ago

Tony20221 commented 2 years ago

I am using Chrome 102.0.5005.63. When I add a new selector and go to the page in the browser, I don't get the selector panel in the bottom left. I can't select anything on the page. I am comparing the behavior with the extension from the web store. Does the extension here work differently? I tried different websites, restarted Chrome and I don't see errors in the console (both in top and the extension's console context). I removed the browser_specific_settings setting from the manifest file.

I am not sure what the issue is.

Yatskov commented 2 years ago

If you want to add a new selector you should follow this guide. Don't forget to add new SelectorType in Controller so you can choose new SelectorType when creating new Selector. If you have some troubles probably we could help if you give us some more information. For example screenshots of what you expect.

I am comparing the behavior with the extension from the web store. Does the extension here work differently?

This extension is fork of https://github.com/martinsbalodis/web-scraper-chrome-extension that one day continued to improve this extension closed source. Because of that appeared another fork https://github.com/jwillmer/web-scraper-chrome-extension which for some time continued the development. We forked the second fork and continued development for our purposes. So for now i think it has lots of differences with the extension published in ChromeStore.

Tony20221 commented 2 years ago

I don't want to add a new selector now. Doesn't the code come with some default selectors that work out of the box? There's a folder called 'selector' with 15 selectors. Shouldn't the extension be using them? Or are you saying I need to manually add/extend them for them to work? Without this step, the current extension won't select anything?

Yatskov commented 2 years ago

Doesn't the code come with some default selectors that work out of the box?

This selector's should work out of the box as described here.

Or are you saying I need to manually add/extend them for them to work?

Only if you find base 15 can't help you to extract data from site.

I updated my chrome to version 102.0.5005.61. And it seems working on latest code. Maybe there is some specific on your site and content_script could no be injected to page to handle selections.

image

Tony20221 commented 2 years ago

I think you jumped over a bunch of steps. First of all, the code in the repo does not work as is directly in Chrome, unlike several Chrome extensions on Github. It needs to be built first. The requirement to build the code is not mentioned on the home page and it's not mentioned in the development instructions. This threw me off. I think the build instructions need to be on the home page. I came across them by coincidence in one of the issues.

I did a 'yarn install' then 'yarn build' and did a 'load unpacked' from the 'dist' folder. It gives an error 'Unrecognized manifest key 'browser_specific_settings'. This seems to be a FireFox setting so I removed it from the manifest file. Don't you get this error? I don't see a mention of this in the pages here.

I create a new sitemap. I select 'Add new selector'. I hover on the page. I expect elements to get highlighted as I hover around. None of that is happening. I tried Chrome and Canary and they both have the same issue. I used two computers.

However, it does work in Firefox (leaving the manifest setting and using the zip file)

Update: I noticed there's an error when I click on the Select button. It's not clear what the reason of the error is. When I tried a non minified version (build:dev), there's no error!

Any ideas on what might be the cause of this error?

image

And this is the error after formatting the code.

image

Yatskov commented 2 years ago

I think you jumped over a bunch of steps. First of all, the code in the repo does not work as is directly in Chrome, unlike several Chrome extensions on Github. It needs to be built first. The requirement to build the code is not mentioned on the home page and it's not mentioned in the development instructions. This threw me off. I think the build instructions need to be on the home page. I came across them by coincidence in one of the issues.

Thank you for your feedback, we will take into account your thoughts about how to improve the documentation.

I did a 'yarn install' then 'yarn build' and did a 'load unpacked' from the 'dist' folder. It gives an error 'Unrecognized manifest key 'browser_specific_settings'. This seems to be a FireFox setting so I removed it from the manifest file. Don't you get this error? I don't see a mention of this in the pages here.

This is just a warning for chrome. In Firefox this settings are required. And that is just a way to have a build for both browsers. Unfortunately I didn't have time to rework build process and build specific dists for browsers.

I create a new sitemap. I select 'Add new selector'. I hover on the page. I expect elements to get highlighted as I hover around. None of that is happening. I tried Chrome and Canary and they both have the same issue. I used two computers.

Can't reproduce issue on my local machine. On what site do you get this error? Have you reloaded page after installing plugin? Does the extension have permissions to add content script to this page? It seems that problem is somewhere in the process of sending message in browser from plugin page to content_script.

I noticed there's an error when I click on the Select button. It's not clear what the reason of the error is. When I tried a non minified version (build:dev), there's no error!

If the problem is only on minifed version maybe there is some problems in minifying process.

Tony20221 commented 2 years ago

I have tried every different case I can think of. I have tried different sites. I tried Chrome and Canary. Canary had no other extensions running. I reloaded the plugin. I restarted Chrome. I used more than one computer. I used minified and unminified versions. Both have the same problem. Except in the unminified version, there's no error showing. If the the error shows that it was an unexpected error, something weird is happening.

I don't understand why it works for you and it doesn't work for me. Are you using the latest Chrome? I am using Firefox as a workaround.

Yatskov commented 2 years ago

One of our analysts encountered a similar bug. It seems that the only diifrence beetwen us could be opearating system. Do you use Windows OS? We'll try to investigate this bug.

Tony20221 commented 2 years ago

I am using Windows 10 in both my work and home computers. They have the same issue.

mxsnq commented 2 years ago

Finally managed to test on my windows machine, сould not reproduce the issue. I have Windows 11, though. Built with node 14.19.3 and yarn 1.22.18.

Chrome 102.0.5005.63

test_chrome test_chrome_version

Canary 104.0.5106.0

test_canary test_canary_version

Tony20221 commented 2 years ago

I have updated Node and Python. I am now using Node v16.15.1 and Python 3.10. Now I am seeing errors I haven't seen before. during yarn install. What is Python2 and why is it trying to use it? I don't know what node gyp is. It seems there's a syntax error in some config file? With the error I am getting I am unable to continue

Is the repo dependent on Linux or Mac or something?

image

Yatskov commented 2 years ago

One of our analysts encountered a similar bug. We discovered that the they tried to open select on pages which are chrome internal pages and could not be selected because content_script could not be injected there.

This repo should not have any python dependencies. Though maybe some build tools inside use it. As far as i can see on this screenshot it asks to install python2.

Tony20221 commented 2 years ago

I don't know why it's trying to use python. I downgraded to Node v14.19.3 and Python 3.9. Now yarn install runs without errors but I am back to the initial error. The configuration seems to be picky on what versions of the tools are being installed. Did you test on the latest version of Node?

mxsnq commented 2 years ago

@Tony20221, do you have the same issue with prebuilt releases?

Tony20221 commented 2 years ago

Yes. Same error with web-scraper-chrome-extension-v0.3.718.zip.