stats4sd / Installation-Guides

A set of installation and basic use guides for software and tools used / recommended by the Research Methods Support / Stats4SD team
GNU General Public License v3.0
0 stars 0 forks source link

Farmer trials/cleaning data #79

Closed chrismclarke closed 5 years ago

chrismclarke commented 5 years ago

updated guidance for heidi->openRefine->heidi Open issue to potentially split final import instructions into separate doc (@dave-mills , do you think it is too long as is?), and also now openRefine 3.1 is out and stable we could consider instructions to query direct from db (testing and working to pull data from mysql. No function exists to populate back in which I think is fine). Again, let me know what you think.

Note, this branch was cloned from the other open PR so contains both. I'm assuming should still merge fine whatever happens, but in case it's confusing why there's additional changes in this request.

closes #80

dave-mills commented 5 years ago

I also realised that, with relationships defined, doing the "empty table" trick before re-importing doesn't work. (SQL doesn't let you truncate a table that's the primary table in a relationship.

There are 2 options to get around this:

  1. Edit the relationship to use on delete cascade - then select all rows and delete them. This will cascade down and also delete the plot_data rows, (so it's also ready to have the cleaned versions re-imported). But it involves learning about on delete cascade, which probably isn't suitable here. So...

  2. Define the relationships after cleaning each individual table.

I think cleaning data as step 3, then defining relations as step 4 should be fine? What do you think @chrismclarke ?

dave-mills commented 5 years ago

(Also need to add a note on deleting the associated records from plot_data. Will do this later this evening!)

dave-mills commented 5 years ago

I agree with your idea of moving to pull data directly from the database - makes the idea of "the database is the source of data for all these tools" more complete. Also means we can delete data in Heidi, where it's easier to find data with specific form_IDs.

chrismclarke commented 5 years ago

Thanks for all this @dave-mills , will try find the time to look today/tomorrow and we can send out early next week.

chrismclarke commented 5 years ago

Eurghh, why isn't anything ever just simple! But in my attempts to harmonise:

  1. I've merged all the open PRs as particularly with filename changes likelihood of changes getting accidentally overwritten was growing too high (in fact I did lose all your additions and had to go back to correct).

  2. I've added to #76 to suggest we provide additional guidance for removing rows with cascade for related tables. I can imagine this as something people might want to do...

  3. Happy to reorder the steps to keep what we want working for now, also happy to go with importing direct from heidi.

This is now closed so comments mostly just for reference, any further discussion should be done in a separate issue