Closed wanghaisheng closed 5 months ago
Thanks for pointing out this issue! We can reproduce it. I'll check it out and get back to you soon.
Hi, I guess the following figure explains most of the points:
This is an actual item with parent_asin=B007I8S9ZK
in the Health_and_Household
domain, with the main_category='Video Games'
and categories=['Health & Household', 'Vision', 'Reading Glasses']
. [link]
We divide the items into each category mainly using the first category of the categories
attribute. Only when the categories
attribute is None, we use main_category
to decide which domain this item is in.
In this case, as we also have no idea how Amazon sets the main_category
and categories
of one item, we just keep them unchanged in the released dataset.
@hyp1231 it seems the logic embed in the collect script, does this dataset release any kind of data collection scripts?
@hyp1231 it seems the logic embed in the collect script, does this dataset release any kind of data collection scripts?
For now, we do not have plans to release data collection scripts.
i want to filter title or description contain 'fda', what I got ,as you can see although input file is meta_Health_and_Household
main_category value is from kinds of the same level with health and household, fda.csv
I cannot understand this