pramsey / pgsql-ogr-fdw

PostgreSQL foreign data wrapper for OGR
MIT License
237 stars 34 forks source link

Option to rename duplicated columns #252

Open robe2 opened 1 year ago

robe2 commented 1 year ago

I haven't looked to see how hard this is. Every once in a while someone has the bad sense to give me a spreadsheet with duplicated columns. Where they'll have something like street, city, street, street, name.

I usually have 2 options to address this:

1) Open up the file and correct the headers - It's usually a spreadsheet 2) Instruct OGR to ignore the headers and give them dummy names like field1, field2 etc using doing something like

CREATE SERVER svr_xlsx FOREIGN DATA WRAPPER ogr_fdw OPTIONS (datasource 'C:/fdw_data/dupe_columns.xlsx', format 'XLSX', config_options 'OGR_XLSX_HEADERS=DISABLE');

What would be really convenient is if there was an option like rename_dupe_columns that just tacks a number at the end of subsequent duplicate columns.

Usually these extra annoying columns are not even ones I want to load, as they are often users duplicating columns to have them beside another or who just got lazy with their header naming.

I'm not sure if that is a change that can be done here, or has to be tackled at the GDAL level. Thoughts?

pramsey commented 1 year ago

Just on the face of it, it doesn't sound at all impossible, we already launder column names, this is just another kind of laundering, one that requires the whole set of inputs, rather than one at a time.

pramsey commented 9 months ago

Having just spent a little time in the code, this is a bit harder than I thought, as I seem to have an assumption of unique source names, so that I can map from source->fdw using name matching. If there are two identically named source columns, that sort of breaks things.