stats4sd / Installation-Guides

A set of installation and basic use guides for software and tools used / recommended by the Research Methods Support / Stats4SD team
GNU General Public License v3.0
0 stars 0 forks source link

Why is distinct query required? #81

Closed chrismclarke closed 5 years ago

chrismclarke commented 5 years ago

@dave-mills I'm working on querying data and am confused (don't know how to explain) the following:

Merging data from hh_info (n1=182) and plot_data (n2=619), many-to-one so expect the same number of rows as plots (populated with the added hh_info). When I run the query using an inner-join I get far too many rows (close to n1 * n2 but not quite, possibly a -1 or approx error). If I SELECT DISTINCT I get the correct number.

image

image

Is it correct to specify DISTINCT, or should I be encouraging a left/right join instead? (if so why is that the default populated by dbforge?)

chrismclarke commented 5 years ago

As a bonus, I went looking through the site for info on Db joins (as it's been very useful up to now for quick refreshers!), but realise there is no section. I'm guessing this would make for a useful/quick concept page? what do you think?

chrismclarke commented 5 years ago

Thanks (for refrence, because dbForge had the preference_data table in view it was also selecting data from there)