Closed MNSleeper closed 1 year ago
This may be a XY problem: why are you trying to drop the tables in the first place? Your program should be able to update the tables, rather than dropping everything and starting from scratch.
Now, for debugging purposes, it could be useful indeed to "start fresh", but in that case, all the databases should be dropped, and not just the foreign key constraint.
Right. I just worry that, by overwriting/updating tables, there will be leftover entries from other executions. Is that a legitimate fear?
Right. I just worry that, by overwriting/updating tables, there will be leftover entries from other executions. Is that a legitimate fear?
This is a more complex discussion that we should maybe have in-person. What is this software exactly trying to achieve?
Is it a "one-time process" that takes one (or multiple) databases, and output one spreadsheet with the all data linked and tidied?
E.g. it takes databases from the NSF and the NIH and output one single spreadsheet containing the combined information?
Is it a "spread-sheet calculator" that process data inside its database and output one spreadsheet with some parameters fixed?
E.g., it loads data from the NSF, the NIH, etc. once and for all, asks for user-input, and output the spreadsheet containing the information relevant to the user-input.
I believe that, ideally, it should be a software
Then, if a new database is added or if a database is updated, the process should be:
In all of those cases, I don't think that dropping anything is needed, but may be wrong.
if it is, observe the differences,
This is a complicated problem, but we can find ways. One possible process is (here, I am supposing that dataA.csv had an older version loaded in the table DataA in sql):
Oh, and there are probably tools that can do that for us, I can investigate if you want.
After reading this, I agree that the internal databases should be kept over multiple execution, and that this is very complicated.
For the purposes of building/testing/improving the matching system, I want be able to start over until we are confident that the matching methods/algorithm are solid. Once its at a good spot, making it consistent across executions sounds like a good way of going about it.
For now, I can just drop the database to get around the FK constraint errors, right?
For now, I can just drop the database to get around the FK constraint errors, right?
Yes, absolutely, you can always "start fresh" and dump everything at the end of each execution.
Then I will start on this.
Ok, that's probably a good first approximation: make it work without the "updating" capacities. It will be slower, but it will be easier to debug / test.
Added method to drop the db and create it before doing anything else in SQL. Now I can access the matching function.
I guess we can close this issue, now, and potentially open another one to discuss the "let's update instead of dumping everything" issues.
That sounds like a plan.
The progress block right now is foreign keys (from linkage) blocking tables from being dropped and created (tables from last execution not being able to be dropped) on new executions. Solutions I've found are:
@aubertc , what do you suggest? I'm leaning towards 2, but know there might be unintended effects I don't know about that such a method would cause.