Closed msochan closed 2 months ago
Why avoiddupes off
? I suppose there's duplicate data in the sets (at least AX_Bundesland
). BTW better diagnostics (ie. values of conflicting records) with usecopy off
AX_Bundesland
is a parameter in config file?
AX_Bundesland
is a parameter in config file?
No a conflicting record that probably appears in each set and will cause an error unless avoiddupes
is on.
I don't know, actually I thought that avoiddupes off
only raise warnings, and it will be nice to know if there are duplicates or not, but it's actually throwing errors and stopping the import.
So is it good practice to use avoiddupes on
as I understood correctly? I mean is there any downsides of using it?
Depends, avoiddupes
should not be necessary for normal deliveries, which shouldn't contain any duplicates - if they still contain duplicates then there should be a problem in the data.
In this case each zip is a separate delivery. Importing them into separate schemas should work without error and shouldn't need avoiddupes
. But you want to import them into one schema and hence the objects that reappear in several datasets will produce errors without avoiddupes
(eg. AX_Bundesland, AX_KreisRegion; maybe also other objects that intersects at the edges of the delivery). Hopefully in this case all data is from the same point in time, because otherwise you might find several versions of the same object in different deliveries (ie. same gml_id
different beginnt
), that would not violate any constraints, but might produce strange output later.
Thank you for your detailed explanation. However problem still exists, even when I have changed to avoiddupes
to on
. Error seems exactly the same.
Data got gid7.1, I'm using gid7 branch - is that fine? Maybe I can try usecopy off
to have better diagnostics?
In addition to previous comment from above, I did run again and turn off PG_USE_COPY, so I can share here how the logs look like with usecopy off
avoiddupes
only works for already inserted rows. If a the same record appears in parallel transactions, none will see the other, in turn no inserts will be skip it and the conflict will arise, when the second transaction is committed.
So I'm guessing only solution to avoid this situation is run with jobs 1
?
Or is it possible to use parallel jobs and use skipfailures
?
Hey, I'm running import of the ALKIs data Rheinland-Pfalz in parallel, and I'm getting these errors.
Do you know what could be the cause of them?
Below you have config_file that I'm using:
And logs: