Closed pigreco closed 1 year ago
Puoi iniziare a esplorare la pagina con VisiData
vd https://www.qgis.org/en/site/forusers/visualchangelog326/index.html#notable-fixes
Non appena possibile, torno con soluzione più pensata e cucita
vd https://www.qgis.org/en/site/forusers/visualchangelog326/index.html#notable-fixes
conosco VisiData
e anche il comando: spettacolare!
manca solo l'autore e sponsor
Ciao @pigreco, per questo tipo di task, devi imparare a fare query XPATH o CSS Selector.
Poi devi guardare la struttura della pagina, capire se c'è qualche elemento utile per distinguere la parte di tuo interesse da tutto il resto.
La parte di tuo interesse è dentro un tag section
con id="notable-fixes"
.
La query XPATH per selezionare quella parte è //section[@id="notable-fixes"]
, che vuol dire: trovami un tag section
ovunque nella pagina, ma che abbia come id
il valore notable-fixes
.
Queste query le puoi testare anche nel browser.
Un altro elemento interessante di questa struttura HTML è che per ogni user, c'è una sub sezione con id uguale al nome dell'user.
Ho fatto uno script bash, che per grandi linee fa questo:
bug-fixes-by-even-rouault
bug-fixes-by-alessandro-pasotti
bug-fixes-by-alex-bruy
bug-fixes-by-sandro-santilli
bug-fixes-by-nyall-dawson
{"nome":"Even Rouault ","numeroRighe":"16","funded":"These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships)"}
nome,numeroRighe,funded
Even Rouault,15,These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships)
Alessandro Pasotti,18,These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships)
Alex Bruy,11,These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships)
Sandro Santilli,11,These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships)
Nyall Dawson,38,These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships)
Come tool uso scrape (per le query XPATH), miller, e xq.
#!/bin/bash
set -x
set -e
set -u
set -o pipefail
folder="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
URL="https://www.qgis.org/en/site/forusers/visualchangelog326/index.html#notable-fixes"
# scarica pagina
curl -kL "$URL" >"$folder"/tmp.html
# estrai id persone
scrape <"$folder"/tmp.html -be '//section[@id="notable-fixes"]/section' | xq -r '.html.body.section[]."@id"' >"$folder"/toto-id.txt
if [ -f "$folder"/toto.jsonl ]; then
rm "$folder"/toto.jsonl
fi
# per ogni utente estrai dati
while read id; do
nome=$(scrape <"$folder"/tmp.html -e '//section[@id="'"$id"'"]/h3/a[1]/text()' | sed -r 's/^.+by *//')
numeroRighe=$(scrape <"$folder"/tmp.html -be '//section[@id="'"$id"'"]/table/tbody/tr' | xq '.html.body.tr|length')
funded=$(scrape <"$folder"/tmp.html -be '//section[@id="'"$id"'"]//p[contains(.,"funded")]' | xq -r '(.html.body.p."#text")+""+(.html.body.p.a."#text")')
echo '{"nome":"'"$nome"'","numeroRighe":"'"$numeroRighe"'","funded":"'"$funded"'"}' >>"$folder"/toto.jsonl
done <"$folder"/toto-id.txt
mlr --j2c clean-whitespace "$folder"/toto.jsonl >"$folder"/toto.csv
lista delle pagine web da cui scaricare i dati:
https://www.qgis.org/en/site/forusers/visualchangelog36/index.html https://www.qgis.org/en/site/forusers/visualchangelog38/index.html https://www.qgis.org/en/site/forusers/visualchangelog310/index.html https://www.qgis.org/en/site/forusers/visualchangelog312/index.html https://www.qgis.org/en/site/forusers/visualchangelog314/index.html https://www.qgis.org/en/site/forusers/visualchangelog316/index.html https://www.qgis.org/en/site/forusers/visualchangelog318/index.html https://www.qgis.org/en/site/forusers/visualchangelog320/index.html https://www.qgis.org/en/site/forusers/visualchangelog322/index.html https://www.qgis.org/en/site/forusers/visualchangelog324/index.html https://www.qgis.org/en/site/forusers/visualchangelog326/index.html https://www.qgis.org/en/site/forusers/visualchangelog328/index.html
le pagine web da cui estrarre i dati non sono molte e quindi procedo manualmente a cambiare URL e avviare lo script, successivamente, tramite cat
(cat *.csv > unico.csv
) appendo tutti i file csv e poi tolgo manualmnete le intestazioni in più.
ecco il risultato (da QGIS 3.6 a QGIS 3.26)
date | version | developer | nroBugFixes | funded |
---|---|---|---|---|
2019-02-22 | QGIS 3.6 | Alessandro Pasotti | 28 | This feature was funded byQGIS.ORG donors and sponsors |
2019-02-22 | QGIS 3.6 | Alexander Bruy | 27 | This feature was funded byQGIS.ORG donors and sponsors |
2019-02-22 | QGIS 3.6 | Even Rouault | 6 | This feature was funded byQGIS.ORG donors and sponsors |
2019-02-22 | QGIS 3.6 | Hugo Mercier | 9 | This feature was funded byQGIS.ORG donors and sponsors |
2019-02-22 | QGIS 3.6 | Julien Cabieces | 9 | This feature was funded byQGIS.ORG donors and sponsors |
2019-02-22 | QGIS 3.6 | Jürgen Fischer | 20 | This feature was funded byQGIS.ORG donors and sponsors |
2019-02-22 | QGIS 3.6 | Loïc Bartoletti | 5 | This feature was funded byQGIS.ORG donors and sponsors |
2019-02-22 | QGIS 3.6 | Martin Dobias | 8 | This feature was funded byQGIS user group Germany |
2019-02-22 | QGIS 3.6 | Nyall Dawson | 20 | This feature was funded byQGIS.ORG donors and sponsors |
2019-02-22 | QGIS 3.6 | Peter Petrik | 8 | This feature was funded byQGIS.ORG donors and sponsors |
2019-02-22 | QGIS 3.6 | Victor Olaya | 10 | This feature was funded byQGIS.ORG donors and sponsors |
2019-06-21 | QGIS 3.8 | Alessandro Pasotti | 33 | This feature was funded byQGIS.ORG donors and sponsors |
2019-06-21 | QGIS 3.8 | Alexander Bruy | 15 | This feature was funded byQGIS.ORG donors and sponsors |
2019-06-21 | QGIS 3.8 | Denis Rouzaud | 1 | This feature was funded byQGIS.ORG donors and sponsors |
2019-06-21 | QGIS 3.8 | Even Rouault | 9 | This feature was funded byQGIS.ORG donors and sponsors |
2019-06-21 | QGIS 3.8 | Loïc Bartoletti | 4 | This feature was funded byQGIS.ORG donors and sponsors |
2019-06-21 | QGIS 3.8 | Peter Petrik | 7 | This feature was funded byQGIS.ORG donors and sponsors |
2019-06-21 | QGIS 3.8 | Victor Olaya | 10 | This feature was funded byQGIS.ORG donors and sponsors |
2019-10-25 | QGIS 3.10 | Alessandro Pasotti | 40 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2019-10-25 | QGIS 3.10 | Alexander Bruy | 19 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2019-10-25 | QGIS 3.10 | Even Rouault | 13 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2019-10-25 | QGIS 3.10 | Matthias Kuhn | 4 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2019-10-25 | QGIS 3.10 | Nyall Dawson | 74 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2019-10-25 | QGIS 3.10 | Paul Blottiere | 5 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2019-10-25 | QGIS 3.10 | Peter Petrik | 8 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2019-10-25 | QGIS 3.10 | Sandro Santilli | 9 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Alessandro Pasotti | 30 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Alexander Bruy | 4 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Bertrand Rix | 9 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Denis Rouzaud | 7 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Even Rouault | 9 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Julien Cabieces | 9 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Loïc Bartoletti | 11 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Nyall Dawson | 22 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Paul Blottiere | 7 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Sandro Santilli | 5 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Sebastien Peillet | 7 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-02-21 | QGIS 3.12 | Stephen Knox | 1 | |
2020-06-19 | QGIS 3.14 | Alessandro Pasotti | 31 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-06-19 | QGIS 3.14 | Alexander Bruy | 15 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-06-19 | QGIS 3.14 | Audun Ellertsen | 2 | This feature was funded byKongsberg Digital |
2020-06-19 | QGIS 3.14 | Bertrand Rix | 4 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-06-19 | QGIS 3.14 | Denis Rouzaud | 6 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-06-19 | QGIS 3.14 | Even Rouault | 17 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-06-19 | QGIS 3.14 | Julien Cabieces | 13 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-06-19 | QGIS 3.14 | Loïc Bartoletti | 5 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-06-19 | QGIS 3.14 | Nyall Dawson | 66 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-06-19 | QGIS 3.14 | Paul Blottiere | 8 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-06-19 | QGIS 3.14 | Sebastien Peillet | 6 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-10-23 | QGIS 3.16 | Alessandro Pasotti | 44 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-10-23 | QGIS 3.16 | Denis Rouzaud | 8 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-10-23 | QGIS 3.16 | Even Rouault | 20 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-10-23 | QGIS 3.16 | Julien Cabieces | 23 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-10-23 | QGIS 3.16 | Matthias Kuhn | 4 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-10-23 | QGIS 3.16 | Nyall Dawson | 83 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-10-23 | QGIS 3.16 | Olivier Dalang | 1 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-10-23 | QGIS 3.16 | Paul Blottiere | 11 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2020-10-23 | QGIS 3.16 | Peter Petrik | 48 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-02-22 | QGIS 3.18 | Alessandro Pasotti | 23 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-02-22 | QGIS 3.18 | Even Rouault | 11 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-02-22 | QGIS 3.18 | Julien Cabieces | 9 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-02-22 | QGIS 3.18 | Nyall Dawson | 31 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-02-22 | QGIS 3.18 | Peter Petrik | 14 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-06-21 | QGIS 3.20 | Alessandro Pasotti | 29 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-06-21 | QGIS 3.20 | Denis Rouzaud | 9 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-06-21 | QGIS 3.20 | Even Rouault | 14 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-06-21 | QGIS 3.20 | Julien Cabieces | 8 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-06-21 | QGIS 3.20 | Loïc Bartoletti | 7 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-06-21 | QGIS 3.20 | Nyall Dawson | 46 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-06-21 | QGIS 3.20 | Paul Blottiere | 7 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-06-21 | QGIS 3.20 | Peter Petrik | 6 | This feature was funded byQGIS.ORG (through donations and sustaining memberships) |
2021-10-22 | QGIS 3.22 | Alessandro Pasotti | 26 | These bug fixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2021-10-22 | QGIS 3.22 | Denis Rouzaud | 1 | These bug fixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2021-10-22 | QGIS 3.22 | Even Rouault | 15 | These bug fixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2021-10-22 | QGIS 3.22 | Julien Cabieces | 11 | These bug fixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2021-10-22 | QGIS 3.22 | Loïc Bartoletti | 9 | These bug fixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2021-10-22 | QGIS 3.22 | Nyall Dawson | 24 | These bug fixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2021-10-22 | QGIS 3.22 | Peter Petrik | 8 | These bug fixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2021-10-22 | QGIS 3.22 | Sandro Santilli | 10 | These bug fixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-02-18 | QGIS 3.24 | Alessandro Pasotti | 27 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-02-18 | QGIS 3.24 | Alexander Bruy | 21 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-02-18 | QGIS 3.24 | Damiano Lombardi | 1 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-02-18 | QGIS 3.24 | Denis Rouzaud | 3 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-02-18 | QGIS 3.24 | Even Rouault | 8 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-02-18 | QGIS 3.24 | Matthias Kuhn | 1 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-02-18 | QGIS 3.24 | Nyall Dawson | 29 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-02-18 | QGIS 3.24 | Paul Blottiere | 5 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-02-18 | QGIS 3.24 | Sandro Santilli | 7 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-06-18 | QGIS 3.26 | Alessandro Pasotti | 18 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-06-18 | QGIS 3.26 | Alexander Bruy | 11 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-06-18 | QGIS 3.26 | Even Rouault | 15 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-06-18 | QGIS 3.26 | Nyall Dawson | 38 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
2022-06-18 | QGIS 3.26 | Sandro Santilli | 11 | These bugfixes were funded byQGIS.ORG (through donations and sustaining memberships) |
statistiche (da QGIS 3.6 a QGIS 3.26) bug 1450
name | nroBugFixes | % | nroVersion |
---|---|---|---|
Nyall Dawson | 433 | 29,9% | 10 |
Alessandro Pasotti | 329 | 22,7% | 11 |
Even Rouault | 137 | 9,4% | 11 |
Alexander Bruy | 112 | 7,7% | 7 |
Peter Petrik | 99 | 6,8% | 7 |
Julien Cabieces | 82 | 5,7% | 7 |
Paul Blottiere | 43 | 3,0% | 6 |
Sandro Santilli | 42 | 2,9% | 5 |
Loïc Bartoletti | 41 | 2,8% | 6 |
Denis Rouzaud | 35 | 2,4% | 7 |
Jürgen Fischer | 20 | 1,4% | 1 |
Victor Olaya | 20 | 1,4% | 2 |
Bertrand Rix | 13 | 0,9% | 2 |
Sebastien Peillet | 13 | 0,9% | 2 |
Hugo Mercier | 9 | 0,6% | 1 |
Matthias Kuhn | 9 | 0,6% | 3 |
Martin Dobias | 8 | 0,6% | 1 |
Audun Ellertsen | 2 | 0,1% | 1 |
Damiano Lombardi | 1 | 0,1% | 1 |
Olivier Dalang | 1 | 0,1% | 1 |
Stephen Knox | 1 | 0,1% | 1 |
date | version | number |
---|---|---|
2019-02-22 | QGIS 3.6 | 150 |
2019-06-21 | QGIS 3.8 | 79 |
2019-10-25 | QGIS 3.10 | 172 |
2020-02-21 | QGIS 3.12 | 121 |
2020-06-19 | QGIS 3.14 | 173 |
2020-10-23 | QGIS 3.16 | 242 |
2021-02-22 | QGIS 3.18 | 88 |
2021-06-21 | QGIS 3.20 | 126 |
2021-10-22 | QGIS 3.22 | 104 |
2022-02-18 | QGIS 3.24 | 102 |
2022-06-18 | QGIS 3.26 | 93 |
@aborruso grazie mille per l'esaustiva spiegazione, sembra tutto facile quando spieghi le cose; ti invidio tanto perché sono strumenti che mi piacerebbe molto saper usare, ma qui ci vuole molta esperienza e creatività per capire cosa cercare e filtrare.
grazie mille per il tempo che ci hai dedicato
@aborruso non riesco a fare il ciclo FOR su un insieme di link, questo script non funziona o meglio estrae solo i dati del primo link
#!/bin/bash
set -x
set -e
set -u
set -o pipefail
folder="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
LINK="https://www.qgis.org/en/site/forusers/visualchangelog36/index.html#notable-fixes
https://www.qgis.org/en/site/forusers/visualchangelog38/index.html#notable-fixes"
# crea ciclo con le pagine web
for lista in $LINK;do
# scarica pagina
curl -kL "$lista" >"$folder"/tmp.html
# estrai id persone
scrape <"$folder"/tmp.html -be '//section[@id="notable-fixes"]/section' | xq -r '.html.body.section[]."@id"' >"$folder"/toto-id.txt
# if [ -f "$folder"/toto.jsonl ]; then
# rm "$folder"/toto.jsonl
# fi
# per ogni utente estrai dati
while read id; do
versione=$LINK
nome=$(scrape <"$folder"/tmp.html -e '//section[@id="'"$id"'"]/h3/a[1]/text()' | sed -r 's/^.+by *//')
numeroRighe=$(scrape <"$folder"/tmp.html -be '//section[@id="'"$id"'"]/table/tbody/tr' | xq '.html.body.tr|length')
funded=$(scrape <"$folder"/tmp.html -be '//section[@id="'"$id"'"]//p[contains(.,"funded")]' | xq -r '(.html.body.p."#text")+""+(.html.body.p.a."#text")')
echo '{"versione":""'"$versione"'",nome":"'"$nome"'","numeroRighe":"'"$numeroRighe"'","funded":"'"$funded"'"}' >>"$folder"/toto.jsonl
done <"$folder"/toto-id.txt
done
@aborruso ora funziona il ciclo, ma ho errore in Miller:
+ mlr --j2c clean-whitespace /mnt/c/Users/pigre/Desktop/featureQGIS/toto.jsonl
mlr: Unable to parse JSON data: Line 1 column 5: Unexpected `h` in object
script
#!/bin/bash
set -x
set -e
set -u
set -o pipefail
folder="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
LINK="https://www.qgis.org/en/site/forusers/visualchangelog36/index.html#notable-fixes
https://www.qgis.org/en/site/forusers/visualchangelog38/index.html#notable-fixes"
# crea ciclo con le pagine web
for lista in $LINK
do
# scarica pagina
curl -kL "$lista" >"$folder"/tmp.html
# estrai id persone
scrape <"$folder"/tmp.html -be '//section[@id="notable-fixes"]/section' | xq -r '.html.body.section[]."@id"' >"$folder"/toto-id.txt
# per ogni utente estrai dati
while read id; do
versione=$lista
nome=$(scrape <"$folder"/tmp.html -e '//section[@id="'"$id"'"]/h3/a[1]/text()' | sed -r 's/^.+by *//')
numeroRighe=$(scrape <"$folder"/tmp.html -be '//section[@id="'"$id"'"]/table/tbody/tr' | xq '.html.body.tr|length')
funded=$(scrape <"$folder"/tmp.html -be '//section[@id="'"$id"'"]//p[contains(.,"funded")]' | xq -r '(.html.body.p."#text")+""+(.html.body.p.a."#text")')
echo '{"versione":""'"$versione"'",nome":"'"$nome"'","numeroRighe":"'"$numeroRighe"'","funded":"'"$funded"'"}' >>"$folder"/toto.jsonl
done <"$folder"/toto-id.txt
if [ -f "$folder"/toto-id.txt ]; then
rm "$folder"/toto-id.txt
fi
done
mlr --j2c clean-whitespace "$folder"/toto.jsonl >>"$folder"/toto.csv
@aborruso ho trovato gli errori, sono qui:
echo '{"versione":""'"$versione"'",nome":"'"$nome"'","numeroRighe":"'"$numeroRighe"'","funded":"'"$funded"'"}' >>"$folder"/toto.jsonl
ci sono "
messi male
ora funziona!!!
echo '{"versione":"'"$versione"'","nome":"'"$nome"'","numeroRighe":"'"$numeroRighe"'","funded":"'"$funded"'"}' >>"$folder"/toto.jsonl
questo script cicla su tutti le pagine web e crea unico file csv:
#!/bin/bash
set -x
set -e
set -u
set -o pipefail
folder="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# crea variabile con lista degli URL
LINK="https://www.qgis.org/en/site/forusers/visualchangelog36/index.html
https://www.qgis.org/en/site/forusers/visualchangelog38/index.html"
# crea ciclo con le pagine web
for lista in $LINK
do
# scarica pagina
curl -kL "$lista" >"$folder"/tmp.html
# estrai id persone
scrape <"$folder"/tmp.html -be '//section[@id="notable-fixes"]/section' | xq -r '.html.body.section[]."@id"' >"$folder"/toto-id.txt
# per ogni utente estrai dati
while read id; do
version=`echo "$lista" | sed -e 's/[^0-9]//g' | sed -e 's/^/QGIS /' | sed -e 's/QGIS 3/QGIS 3./'`
developer=$(scrape <"$folder"/tmp.html -e '//section[@id="'"$id"'"]/h3/a[1]/text()' | sed -r 's/^.+by *//')
nroBugsFixes=$(scrape <"$folder"/tmp.html -be '//section[@id="'"$id"'"]/table/tbody/tr' | xq '.html.body.tr|length')
funded=$(scrape <"$folder"/tmp.html -be '//section[@id="'"$id"'"]//p[contains(.,"funded")]' | xq -r '(.html.body.p."#text")+""+(.html.body.p.a."#text")')
data=`grep 'Release date:' "$folder"/tmp.html | sed -e 's/[^0-9-]//g'`
echo '{"data":"'"$data"'","version":"'"$version"'","developer":"'"$developer"'","nroBugsFixes":"'"$nroBugsFixes"'","funded":"'"$funded"'"}' >>"$folder"/toto.jsonl
done <"$folder"/toto-id.txt
if [ -f "$folder"/toto-id.txt ]; then
rm "$folder"/toto-id.txt
fi
done
# ripulisce il file dalla presenza di tab \t
sed -i 's/\t//g' "$folder"/toto.jsonl
# converte da jsonl a CSV
mlr --j2c clean-whitespace "$folder"/toto.jsonl >"$folder"/toto.csv
# rimuove file non piu' utili
rm tmp.*
rm *.jsonl
Ma questa perché non è chiusa, con ricetta?
Ciao @aborruso
Ma questa perché non è chiusa, con ricetta?
non ricordo perché è ancora aperta, ma non credo di aver fatto ricetta; le farò, ma non so quando :-(
ricetta fatta e pubblicata: https://tansignari.opendatasicilia.it/ricette/bash/tabelle_in_pagine_web_estrarre_autore_e_nro_righe/
grazie mille(r) @aborruso
In queste pagine web al paragrafo
Notable Fixes
, ci sono delle tabelle, una sotto l'altra, con varie righe e colonne; ogni tabella è caratterizzata da un numero di righe, da un autore e da chi ha finanziato la risoluzione dei bug, sotto un esempio:come estrarre, per ogni tabella, il numero di righe, autore e finanziatore?
sotto un esempio di output