Open SaravgiYash opened 3 years ago
I would like to work on this issue.
Can I start ?
I would like to work on this issue.
Okay, but it would be better if you choose one issue at a time and after filing a PR for that issue you can start working on it.
Ok Thank You
@Yashs911 Can you please help me to solve this issue Actually I'm newbie to Open Library, That's why I could not find the file where I should change 😁
@Bhavna777 Actually, I don't know the root cause, so I don't know where we should start. As per https://github.com/internetarchive/openlibrary-librarians/issues/1 and some other issues linked to this. I will suggest that we hide the publication year >= 2021 for the time being.
Added to librarians repo for manual correction. https://github.com/internetarchive/openlibrary-librarians/issues/53
@seabelis Actually this issue is not just related to https://openlibrary.org/search?q=mark&mode=everything&sort=new but many books on OL have the wrong publication year so I was wondering if it was possible to hide publications year > 2021
@seabelis Actually this issue is not just related to https://openlibrary.org/search?q=mark&mode=everything&sort=new but many books on OL have the wrong publication year so I was wondering if it was possible to hide publications year > 2021
But it will create problem in the upcoming years.
But it will create problem in the upcoming years.
By 2021 I meant we can use Current Year function
I'm not the person to decide, but I'd prefer to delete the incorrect data than to hide it.
@scottbarnes can you confirm whether this can be closed now re: 9999?
I'm re-purposing this issue to clean up works that have future dates.
https://openlibrary.org/query.json?type=/type/edition&publish_date~=9999*&limit=1000
or first_publish_year:[2025 TO *]
in solr, e.g.:
https://openlibrary.org/search?q=first_publish_year%3A%5B2025+TO+*%5D&mode=everything&sort=new
Proposal
It may be helpful to keep a record of items we've so modified in case we later want to go back and, for example, reimport them or otherwise modify them further, and this way it will be easy to identify the ones from which we've removed publish_date
.
@hornc notes that he is planning on removing all the 9999
dates in a bulk process. I believe this would tackle the bulk of the problem let us see...
There are about 5,868 editions with publish year 9999, and another 15,707 with publish years after 2025 but not 9999. Flipping through them it's unclear why exactly they have these weird dates and whether they should be deleted :confused: I think fixing the 9999 set is a good first stab. Would you be able to keep a list of the editions your script edits, and upload it to the issue? We might want to do further investigation on these editions later, and having a way to find them would be useful!
One cause of the 9999
problem relates to MARC imports and the existing issue: #2711 I started cleanup and noticed a number of 9999
dates originate from Harvard MARC records where the 9999
is in the 008
field, but there is a correct publication date (often) in 260$c
https://openlibrary.org/books/OL45340001M/%CA%BBAlimi_aman_jo_Islami_manshur?m=history
and
I'll see if there is a way to easily add the correct dates as a go, and look at patching the MARC import hole. -->
See PR: #8448
@mekarpeles I believe all the 9999
dates have been removed from Open Library.
A lot of the remaining future dates are simply spam: e.g. https://openlibrary.org/search?q=first_publish_year%3A%5B2025+TO+*%5D+Customer+Service+number&mode=everything&sort=new
and
And there are other variations
Evidence / Screenshot (if possible)
Many works have wrong year of publication (Like 9999, 2049, 2040....)
See: https://openlibrary.org/search?q=publish_year%3A%5B2025+TO+*%5D
Relevant url?
https://openlibrary.org/search?q=mark&mode=everything&sort=new https://openlibrary.org/works/OL21132031W/Classical_Music_Picture_Book?edition= https://openlibrary.org/works/OL21486637W/Making_Sense_of_Politics?edition=
Details
Proposal
Use
first_publish_year:[2025 TO *]
in solr, e.g. https://openlibrary.org/search.json?q=first_publish_year%3A%5B2025+TO+*%5D, to find future datesStakeholders