ulixee / secret-agent

The web scraper that's nearly impossible to block - now called @ulixee/hero
https://secretagent.dev
MIT License
670 stars 44 forks source link

Element.dataset is empty #213

Closed Heljas closed 3 years ago

Heljas commented 3 years ago

Element.dataset doesn't return expected values.

import { Agent, LocationStatus } from "secret-agent";
interface Dataset {
  example: string;
}
(async () => {
  process.env.SHOW_BROWSER = "true";
  const agent = new Agent({
    showReplay: false,
    humanEmulatorId: "basic",
  });
  await agent.goto("https://dataset.tiiny.site/");
  await agent.activeTab.waitForLoad(LocationStatus.DomContentLoaded);
  const element = await agent.document.querySelector("#main");
  const dataset = element.dataset as Dataset;
  console.log(dataset); //DOMStringMap {}
  console.log(dataset.example); //undefined
  await agent.close();
})();

I guess that's because Element.attributes is also empty.

Here is quick workaround if anybody needs it:


export const getDataset = async (element: ISuperElement) => {
  const html = await element.outerHTML;
  const dataset: { [key: string]: string } = {};
  const matches = html.match(/data-.+?(?==)/g);
  if (!matches) return dataset;
  await Promise.all(
    matches.map(async attribute => {
      const value = await element.getAttribute(attribute);
      const [, ...name] = attribute.split('-');
      const key = name.join().replace(/,./g, x => x[1].toUpperCase());
      dataset[key] = value ?? '';
    }),
  );
  return dataset;
};
blakebyrnes commented 3 years ago

Thanks. Looks like we're missing the definition in AwaitedDom (https://secretagent.dev/docs/awaited-dom/dom-string-map). Shouldn't be too hard to add. Will get to it soon.