[multiJoin] come aggiungere enne attributi ad uno shapefile

opendatasicilia / tansignari

"T'ansignari e t'appeddiri"

http://tansignari.opendatasicilia.it

Creative Commons Attribution 4.0 International

18 stars 10 forks source link

[multiJoin] come aggiungere enne attributi ad uno shapefile #220

Closed pigreco closed 2 years ago

pigreco commented 2 years ago

Problema

Ho 100 shapefile (shp001, shp002, ...., shp050, ..., shp100) identici come geometria e nomi attributi. (id, name, value)

Obiettivo

Creare un unico shapefile (unico.shp) con geometria (non duplicata) e attributi: id, name, value_shp001, value_shp002,..., value_shp100

esempio shapefile tipo

esempio output finale (solo per tre shp)

ogr2ogr

https://gdal.org/programs/ogr2ogr.html

per realizzare una JOIN tabellare tra due shapefile:

ogr2ogr -sql "SELECT t1.id AS id, t1.name AS name, t1.value AS value_shp001, t2.value AS value_shp002 \ 
FROM shp002 t1 JOIN './shp002.shp'.shp002 t2 \ 
ON t1.id=t2.id" ./shpOUT.shp ./shp001.shp

Domanda

Che processo/modalità/logica usare per realizzare quanto descritto negli obiettivi?

Dati per test

andrea.zip

pigreco commented 2 years ago

Un probabile approccio potrebbe essere quello di convertire gli shapefile (dal shp002 al shp100) in semplice tabella CSV e fare un merge orizzontale tra tutti i file, infine, fare una sola JOIN tabellare tra il primo shp001 e il tabellone risultante dal merge precedente. [tratto da: quattrochiacchiereinquattro by Andrea Borruso]

Ma la domanda che nasce spontanea è: Come fare il merge orizzontale con molti file CSV in ingresso ?

aborruso commented 2 years ago

@pigreco aggiungi per favore un output di esempio? Vorrei un quadro delle colonne di output che desideri e dei nomi delle colonne

Altrimenti si rischia di scrivere codice inutile.

Grazie

pigreco commented 2 years ago

@pigreco aggiungi per favore un output di esempio?

ho modificato l'issue principale. in sostanza il nome del campo, del file finale, deve contenere il nome del campo di origine unito al nome del file di provenienza value_shp001

aborruso commented 2 years ago

caro @pigreco , una cosa fatta di corsa

#!/bin/bash

set -x
set -e
set -u
set -o pipefail

# crea i CSV e ordina i CSV per ID
for i in *.shp; do
  name=$(basename "$i" .shp)
  ogr2ogr -f CSV -sql 'select id,value from '"$name"'' "$name".csv "$name".shp
  mlr -I --csv sort -n id then cut -f value then rename value,value_"$name" "$name".csv
done

# unisci i CSV in un unico CSV
paste -d "," shp*.csv > all.csv

# estrai il primo shp e convertilo in CSV
primoShape=$(find ./ -iname "*.shp" -type f | head -n 1)
tmp=$(basename "$primoShape" .shp)
ogr2ogr -f CSV  tmp.csv "$tmp".shp

# estrai da questo ultimo soltanto la colonna id
mlr -I --csv cut -f id then sort -n id tmp.csv

# crea il file finale
paste -d "," tmp.csv all.csv > finale.csv

produce questo CSV, che potrai linkare alla tua geometria

id	value_shp001	value_shp002	value_shp003	value_shp004
1	23	10	254	50
2	34	25	32	41
3	100	150	541	47

pigreco commented 2 years ago

@aborruso notevole, veramente notevole.

grazie per il tempo dedicatoci :-)

aborruso commented 2 years ago

Qui l'oggettino chiave è il mitico paste un altro oggetto nativo di Linux, che concatena file anche in orizzontale.

aborruso commented 2 years ago

Un'altra modalità

#!/bin/bash

# unisci in verticale gli shape
ogrmerge.py -overwrite_ds -single -src_layer_field_name layer -o merged.shp shp*.shp

# converti lo shape in  CSV
ogr2ogr -f CSV merged.csv merged.shp

# converti il CSV da wide a long
mlr -I --csv reshape -s layer,value merged.csv

che dà in output questo CSV

id	nome	shp001	shp002	shp003	shp004
1	ciao andrea	23	10	254	50
2	ciao andrea 2	34	25	32	41
3	ciao andrea 3	100	150	541	47

gpirrotta commented 2 years ago

Una possibile soluzione in python

import glob
import geopandas as gpd

files = glob.glob("../data/andrea/*.shp")
files.sort()
gdf = gpd.read_file(files[0])
gdf.rename(columns={'value':f'value_shp001'}, inplace=True)

for f in files[1:]:
    name = f[f.rfind('/')+1:-4]
    gdf2 = gpd.read_file(f)
    new_column = f'value_{name}'
    gdf2.rename(columns={'value': new_column}, inplace=True)
    gdf = gdf.merge(gdf2[['id',new_column]], on='id')

🤗

pigreco commented 2 years ago

Aggiungo una soluzione mista usando QGIS e VisiData:

in QGIS usare algoritmo Fondi Vettori :

cancellare la colonna path che non serve;
esportare in CSV;
con VisiData fare una Pivot (ovvero, trasformare la tabella da wide a long), e si otterrebbe:

oppure, per fare la Pivot, plugin Group Stats:

pigreco commented 2 years ago

ricetta pubblicata, grazie a tutti

https://tansignari.opendatasicilia.it/ricette/riga_comando/multi_join/