Closed tfmorris closed 1 year ago
Hi, @jimchamp! Our team (@yujiezh9 and I) are students from a software engineering course, and both of us have experience in web application development and successfully build and run this project locally. We are wondering whether we can be assigned to this task, thank you!
@AGoodName244 this is a data issue that I don't quite have an approach for right now, but will be addressed as we revamp our import pipeline this year. Maybe you'd be interested in #7755, which has a solution outlined? If so, comment on that issue and somebody will assign you.
@AGoodName244 this is a data issue that I don't quite have an approach for right now, but will be addressed as we revamp our import pipeline this year. Maybe you'd be interested in #7755, which has a solution outlined? If so, comment on that issue and somebody will assign you.
Thank you for your response and suggestion. We will certainly take a look at issue #7755. Before your reply, we conducted some research into the data issue and found some potential clues, although we haven't yet solved it. (Based on development version)
@AGoodName244, the code in /core/vendors.py is unrelated to our importer code. If I recall correctly, vendors.py
fetches price information for book pages, and is used to create new editions, if needed, when people visit an /isbn/{isbn} page.
The publish dates were parsed correctly from the source import data in each of the cases that you outlined: | Record | Title | Source Data |
---|---|---|---|
OL45991226M | Northern Fishes | JSON | |
OL45868829M | Northern fishes, with special reference to the upper Mississippi valley | JSON |
@AGoodName244, the code in /core/vendors.py is unrelated to our importer code. If I recall correctly,
vendors.py
fetches price information for book pages, and is used to create new editions, if needed, when people visit an /isbn/{isbn} page.The publish dates were parsed correctly from the source import data in each of the cases that you outlined:
Record Title Source Data OL45991226M Northern Fishes JSON OL45868829M Northern fishes, with special reference to the upper Mississippi valley JSON
Thank you so much for the clarification on the issue, and apologies for any confusion caused. After examining the JSON files, we have some concerns about the data that we are curious about. It appears that there may be some discrepancies or mismatches in the data. For the JSON file of | Record | Title | Source Data |
---|---|---|---|
OL45868829M | Northern fishes, with special reference to the upper Mississippi valley | JSON |
The book "Managerial Epidemiology" (ASIN: 076373165X) showed a Publication date of 20050101, while the website displays it as May 1, 2005. Additionally, I appreciate your suggestion to focus on issue #7755, and we will certainly look into it. We would be happy to continue our contribution. Thank you again for your support.
For these cases, can we: Add logic to (explicitly only) BWB importer, ignore 01-01 (and just import the year).
If the years are wrong then the recourse we have is for human/librarians to fix it.
If someone wants to help, the relevant code will be in
@mekarpeles Surely that isn’t scalable. We don’t have that many human contributors. We know those -01-01 dates are nearly always bogus: what publisher works New Year’s Day? Just fix ImportBot.
I would not object to importing years only for all cases. 01-01 is just an upstream-enforced date we are importing from elsewhere. Even when a book has an exact date specified (French publishers frequently do this) the correct date is not what we import. Even if the date is not 01-01, the imported exact dates never matches the actual dates specified in the books. Mass-market paperbacks frequently do specify a year and a month. Amazon imports of these are frequently a month off (or sometimes a year off, if it happens to be Dec/Jan). I suspect many of these seemingly arbitrary dates have to do with either the date the item was added to Amazon or the date it went on sale. Neither of these are relevant to us as we are not aiming to be a mirror of Amazon.
Discussed with @judec -- likely an upstream problem with dates coming in as 01-01.
Both of these seem like useful places to investigate (BWB monthly imports and promise item imports):
The number of editions published on January 1st has skyrocketed recently and the apparent cause is either BetterWorldBooks (BWB) metadata or a bug in the BWB importer.
Evidence / Screenshot (if possible)
Relevant url?
Steps to Reproduce
Proposal & Constraints
Stop importing bad data from BetterWorldBooks (BWB)
Stakeholders
@mekarpeles @hornc