midudev / kings-league-project

API y website de la Kings League Infojobs por temas didácticos
https://kingsleague.dev
MIT License
1.52k stars 224 forks source link

Scrapper improvements #65

Open loadko opened 1 year ago

loadko commented 1 year ago

Overview

List of refactors or improvements that can be done to scraping folder.

List

Others improvements are welcomed.

Fasping commented 1 year ago

Overview

List of refactors or improvements that can be done to scraping folder.

List

  • [ ] Create a function for this piece of code
const rawValue = $el.find(selector).text()
const cleanedValue = cleanText(rawValue)
const value = typeOf === 'number' ? Number(cleanedValue) : cleanedValue

Something like getValueFromElement($, selector, typeOf)

  • [ ] Export getImageFromTeam from mvp.js to a util file and use it in scrappers

Others improvements are welcomed.

Add more suggestions :

Its Only and idea

Add support for multiple languages in the scraping application to be able to extract information from web pages in different languages.


Idea development

To add multi-language support to the scraper, we must first modify the getTopScoresList function to accept a language parameter indicating the language of the web page from which you want to extract data.

Then add an Accept-Language header to the options object that is passed to the fetchAndParse function to tell the web page server that you want the information in the specified language.

To modify the selectors to match the specified language, we should use a control flow structure such as a switch or a mapping object to assign the correct selectors based on the specified language.


Code idea

First, modify the getTopScoresList function so that it accepts a language parameter indicating the language of the web page from which you want to extract the information:

export async function getTopScoresList($, language) {
  // codigo aqui
}

Then, add an Accept-Language header to the options object that is passed to the fetchAndParse function to tell the web page server that you want the information in the specified language.

const options = {
  headers: {
    'Accept-Language': language
  }
}

const $ = await fetchAndParse(URL, options)

Then create a mapping object that maps the correct selectors based on the specified language:

const languageSelectorsMap = {
  en: {
    ranking: { selector: '.fs-table-text_1', typeOf: 'string' },
    team: { selector: '.fs-table-text_3', typeOf: 'string' },
    playerName: { selector: '.fs-table-text_4', typeOf: 'string' },
    gamesPlayed: { selector: '.fs-table-text_5', typeOf: 'number' },
    goals: { selector: '.fs-table-text_6', typeOf: 'number' }
  },
  es: {
    ranking: { selector: '.fs-table-text_1', typeOf: 'string' },
    team: { selector: '.fs-table-text_3', typeOf: 'string' },
    playerName: { selector: '.fs-table-text_4', typeOf: 'string' },
    gamesPlayed: { selector: '.fs-table-text_5', typeOf: 'number' },
    goals: { selector: '.fs-table-text_6', typeOf: 'number' }
  }
  // Agrega más idiomas aquí
}

Then we can use the mapping object to assign the correct selectors based on the specified language:

const modifiedSelectors = languageSelectorsMap[language] || SCORES_SELECTORS

Finally, use the modified selectors to extract the information from the web page as usual:

const scoresSelectorEntries = Object.entries(modifiedSelectors)
const topScorerList = []

$rows.each((index, el) => {
  const topScorerEntries = scoresSelectorEntries.map(([key, { selector, typeOf }]) => {
    const rawValue = $(el).find(selector).