Open xavierpena opened 8 years ago
I think you are missing another condition on the where clause as it appears you are summing different nutrition categories together.
Use this query and see which additional condition you may need:
SELECT
ndata.nutr_val,ndef.nutr_no, ndef.nutrdesc
FROM
nutrient_data ndata
JOIN
nutrient_definition ndef ON ndata.nutr_no = ndef.nutr_no
JOIN
food_desc fdesc ON fdesc.ndb_no = ndata.ndb_no
JOIN
food_grp_desc fgrpdesc ON fgrpdesc.fdgrp_cd = fdesc.fdgrp_cd
WHERE
long_desc = 'Cheese, blue' and units_text = 'g';
Thank you for your answer.
As you can see, I know little about nutrition. I am just a developer trying to make sense of the data.
different nutrition categories
I re-examined the tables and I don't know how to find these different nutrition categories. The only plausible hypothesis I could come up with is:
I don't know if you could confirm if that makes sense or not.
Yeah, I think there is some duplication.
For example:
SELECT ndata.nutr_val,ndef.nutr_no, ndef.nutrdesc FROM nutrient_data ndata JOIN nutrient_definition ndef ON ndata.nutr_no = ndef.nutr_no JOIN food_desc fdesc ON fdesc.ndb_no = ndata.ndb_no JOIN food_grp_desc fgrpdesc ON fgrpdesc.fdgrp_cd = fdesc.fdgrp_cd WHERE long_desc = 'Cheese, blue' and units_text = 'g' and (ndef.nutrdesc like '%lipid%' or ndef.nutrdesc like '%fat%') ;
+----------+---------+------------------------------------+ | nutr_val | nutr_no | nutrdesc | +----------+---------+------------------------------------+ | 28.74000 | 204 | Total lipid (fat) | | 18.66900 | 606 | Fatty acids, total saturated | | 7.77800 | 645 | Fatty acids, total monounsaturated | | 0.80000 | 646 | Fatty acids, total polyunsaturated | +----------+---------+------------------------------------+
You have total fat and then a breakdown of 3 types of fats.
See what happens when you exclude on the word 'Total'
Yeah, I think there is some duplication.
OK, that makes sense.
See what happens when you exclude on the word 'Total'
The results are:
ndef.nutrdesc NOT like '%total%'
sum(nutr_val) = 120.52000
and:
ndef.nutrdesc like '%total%'
sum(nutr_val) = 56.48700
anyway, in your example you had:
1 "super-group": Total lipid (fat)
3 "sub-groups":
...all of them containing the word "total", so it seems difficult to make a breakdown of nutrients and their sub-nutrients through naming conventions.
I think the data makes sense regardless, it is normal that you need to be an expert on nutrients to fully grasp the the meaning of the data. It's just good to know how it is organized: the conclusion is that nutrients and "its sub-nutrients" are mixed in the same table.
Would there be any benefit to using: ndef.nutrdesc NOT like 'Total%'
or do you get similar issues?
With ndef.nutrdesc NOT like 'Total%'
, the result is sum(nutr_val) = 148.26700
Hi and thanks for this fantastic compilation of data.
I made a tutorial about how to set up the data (while I was doing it for myself, I thought it would be a good idea to share it): http://localcode.wikidot.com/playing-with-a-food-database
Maybe github is not the right place to ask about how the underlying data works, but using your database I made the sum of
nutr_value
for a single food description (in this case: "blue cheese") focusing only in units that were in grams. As I understand it, the values are "per 100gr of food", so I expected a sum lower than 100. I got 177 instead. The error is more likely to be because of my interpretation of the data rather than an error in the database, but I wanted to ask in case you or anyone else understood what was going on here.Note: to run this query you need to first transform
units
(inblob
in the original database) tounit_text
(asvarchar
). As explained in my blog post, it's done like so: