jshemas / openGraphScraper

Node.js scraper service for Open Graph Info and More!
MIT License
668 stars 105 forks source link

Add character encoding detection and decoding logic #206

Closed cm-dyoshikawa closed 7 months ago

cm-dyoshikawa commented 8 months ago

Resolve https://github.com/jshemas/openGraphScraper/issues/199

jshemas commented 8 months ago

Looks like this causes a bunch of unit tests to fail. Can you look into that?

dyoshikawa commented 8 months ago

@jshemas Thank you for your comment. Sure, I'm looking into those.

cm-dyoshikawa commented 8 months ago

@jshemas The result of npm run mocha:unit has now become successful.

jshemas commented 7 months ago

Hello, can also look into why the integration tests are failing? npm run mocha:int (You can just focus on the failing tests in encoding.spec.ts )

cm-dyoshikawa commented 7 months ago

Hello, can also look into why the integration tests are failing? npm run mocha:int (You can just focus on the failing tests in encoding.spec.ts )

Sorry, I worked on unit tests only. I'll also check on that.

cm-dyoshikawa commented 7 months ago

@jshemas

I have confirmed that the following test execution command passes all tests:

npx ts-mocha --recursive \"./tests/integration/encoding.spec.ts\" --timeout 10000

The errors that were occurring were due to files intended to be in euc-jp or Shift_JIS being encoded in utf-8 on openGraphScraperPages.

I have converted these files to the expected character encoding and temporarily hosted them on my github.io for testing purposes, so please check it out.

I have created a separate Pull Request regarding this issue.

Once it is merged into openGraphScraperPages, I will make further adjustments to this test.

cm-dyoshikawa commented 7 months ago

@jshemas

https://github.com/jshemas/openGraphScraper/actions/runs/7810736250/job/21348174777

  109 passing (19s)
  7 pending
  1 failing

  1) fetch
       setting a timeout:

      AssertionError: expected 'Page not found' to deeply equal 'The operation was aborted due to time…'
      + expected - actual

      -Page not found
      +The operation was aborted due to timeout

      at /home/runner/work/openGraphScraper/openGraphScraper/tests/integration/fetch.spec.ts:48:33

Using response.clone() seemed to cause this test to fail.

Therefore, I made changes to avoid using clone() without altering the behavior. This resolved the error.

cm-dyoshikawa commented 7 months ago

@jshemas I would be happy if you could rerun the test. Sorry to bother you while you're busy.

jshemas commented 7 months ago

Thanks for fixing this issue! PR looks good to me.

jshemas commented 7 months ago

Changes are live in open-graph-scraper@6.4.0

cm-dyoshikawa commented 7 months ago

@jshemas Thank you for checking and merging!