Open beingabstrac opened 4 years ago
@beingabstrac it sounds like your scraping needs are a little more complex. You can reference the generateMarkdown()
method in this repo to help with the markdown generation. You'll just need to construct a JSON object with your data.
If you're needing to physically emulate clicks, then you should try using something like Puppeteer. There's also a library called xray.js that might help you with the separate pages for each item issue. It handles pagination really well.
@parkeragee I did try puppeteer. Was able to generate one page but was not able to go back to the root and do the same for n items. Here's the gist.
Will try x-ray once.
So, this is the site am trying to scrape. Got a list of items, click one item go in and scrape the data and save as .md file with the item name as file name. Do the same for n items.
I haven't checked to see if this works, so it might require some tweaking. But, here's how I would try approaching it.
/**
* The initial scrape of the directory page to get all
* the plugins in one list
* @return {Array} The array of plugins with their data
*/
async function scrapePluginDirectory() {
/**
* Scrapes the directory page and generates a list of plugins
* @return {Array} An array of objects that contain data about each plugin
*/
const pluginList = [];
// Do scraping here and push each plugin into an array of objects with the data you need
// Example of what we would return:
// [{ name: 'Unsplash', link: 'https://www.figma.com/community/plugin/738454987945972471' }]
}
/**
* Takes our individual plugin link and scrapes it
* @param {String} link The plugin page link
* @return {Object} Your plugin data needed for the markdown page
*/
async function scrapePluginHtml(link) {
// Scrape your individual plugin page here
// and return your data needed for the markdown file.
}
/**
* Takes our individual plugin data that we scraped and generates a markdown file
* @param {Object} pluginData The plugin data
* @return {void}
*/
async function createMarkDownFile(pluginData) {
// Take our data, generate a markdown file with `json2md`
}
async function getPluginData(plugin) {
const pluginData = await scrapePluginHtml(plugin.link);
await createMarkDownFile(pluginData);
}
async function scrapeAndMakeMarkdown(pluginList) {
/**
* Takes the plugin list and loops over it
* to scrape each item and generate the markdown file
* @param {Array} pluginList The plugin list we scraped one step before
* @return {void}
*/
return await pluginList.map(getPluginData);
}
async function go() {
const pluginList = await scrapePluginDirectory();
const result = await scrapeAndMakeMarkdown(pluginList);
return result;
}
go();
Hey @parkeragee
Got stuck, how do I do this -
P.S. the class names are complex(with weird names and numbers). How do I use the same with XPATH?
Thanks in advance!