socrata / datasync

Desktop / Console application for updating Socrata datasets automatically.
http://socrata.github.io/datasync/
MIT License
80 stars 33 forks source link

EN-49082: rework column-matching #204

Closed rjmac closed 2 years ago

rjmac commented 2 years ago

As a result of the NBE change we wanted to be a little fuzzy on matching CSV/dataset columns, but this turned out to be too eager. Now that the NBE is everywhere, let's only do that if we're dealing with a job file from before that change, where "before that change" is defined as "not having a flag that says it's from after that change".

rjmac commented 2 years ago

I think all of your questions are variations on the same thing, so here's another attempt at explaining it:

The current behavior of datasync is: when a saved job is loaded, pretty much just ignore the current column-mapping and try to generate a new one from the CSV header and the dataset schema. This isn't so great if you've customized the column-mapping, so what we want to do is say "if you've loaded this from a saved configuration and you've saved that configuration after this release happened (in order to avoid breaking existing setups) trust what's in the job instead of trying to be smart about it".

All the rest of this flows from that more or less through accidents of architecture of how the files are saved and loaded, which is kinda weird. So here are the requirements:

Which means we want the new behavior (having isPostNbeification true) every time an IntegrationJob is created except when it's created by the load contstructor. But, that load constructor defers to Jackson to create an intermediate value, and Jackson will use the default constructor to create the IntegrationJob!

So this is what the ThreadLocal is for, and why it's called a hack in the code. It exists and is normally true except when we're going through the codepath inside Jackson where it creates a new instance from the Class<IntegrationJob> that we pass into it on line 143. There it's false, so the default constructor initializes the isPostNbeification to false, and then, if it's actually a new job, seeing the field in the JSON that it's loading will cause Jackson to flip it over to true before returning it to us.