Open saroup opened 5 years ago
I'm having the same issue except it's returning Tuesday of the previous week:
>>> parse('now').strftime('%a %Y-%m-%d')
'Mon 2019-10-21'
>>> parse('tuesday').strftime('%a %Y-%m-%d')
'Tue 2019-10-15'
>>> parse('next tuesday').strftime('%a %Y-%m-%d')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'strftime'
Hi, I am Gargi Vyas. I am GSOC 2020 candidate and would like to work on this bug.
That'd be awesome, Gargi. Thanks!
@Gallaecio, @noviluni I have looked through the code and I think I understand the gist of it at this point. What would the recommended way to tackle this? Would appreciate some suggestions to get started. A separate function in FreshnessDateDataParser maybe?
@aditya-hari Go ahead and propose an approach in a pull request. It’s easier to discuss over code :slightly_smiling_face:
@Gallaecio I haven't really come up with anything concrete in code yet, can't open a pull request.
Things like 'next tuesday' aren't identified with any locale, so there has to be some changes made in the locale info to sort that out. I am not entirely sure how to though.
I thought about just changing the date_string to something standard like "in x days/months" but that will obviously only work for English if implemented in that way.
@aditya-hari I suggest you start from FreshnessDateDataParser.parse
, go through what the code does keeping the target strings in mind (e.g. “Next Tuesday”), and make the required changes as you go. I see for example that ago
and in
are hardcoded in some parts, I guess you will need to add next
there.
You could add a test for “Next Tuesday”, extend FreshnessDateDataParser.parse
as needed until it is parsed successfully, and then make sure no other tests are broken after your changes.
@Gallaecio
Sorry it is taking me this long, I have something sort of working, I will hopefully open a PR soon. However the way I am doing this won't be able to handle the "after 15 days" situation mentioned in #635
Not a problem. It’s OK to just fix “Next
There are a lot of time-related translations available in the unicode-cldr xml or json files that could definitely be used to augment dateparser.py with things that handle all sorts of variations like 'Next Tuesday'. Of course, I'd also like to see something that cover 'Next Weekend' or 'on the weekend'... but it doesn't look like that's been defined as yet.
Anyway, what would it take to pull in the cldr datefields for each language and incorporate them?
https://github.com/unicode-cldr/cldr-dates-full/blob/master/main/ru/dateFields.json
Okay, just saw this issue from 2 years ago -- cldr_language_data | move data directory | 2 years ago So, is it just that the cldr_language_data needs updating to include more variations of 'next'?
My apologies... it seems there is a script to do just this already in the code:
Is the current dictionary up to date or is it just that the existing code isn't calling things like 'next' that already exist in the code?
In freshness_date_parser, I think we need to add something from calendar to get the right day of the week?
td = relativedelta(**kwargs)
relativedelta arguments for 'next' + dayofweek needs to add a day, then check the calendar for the next one?
today = datetime.datetime.now() (happens to be Friday) today + relativedelta.relativedelta(weekday=calendar.FRIDAY)
today + rld.relativedelta(weekday=calendar.FRIDAY) datetime.datetime(2020, 3, 20, 8, 55, 7, 615746) [today, instead of next friday]
so, we have to add a day to today, then look for next Friday:
today + rld.relativedelta(days=+1) datetime.datetime(2020, 3, 21, 8, 55, 7, 615746) today = today + rld.relativedelta(days=+1) today + rld.relativedelta(weekday=calendar.TUESDAY) datetime.datetime(2020, 3, 24, 8, 55, 7, 615746) today + rld.relativedelta(weekday=calendar.FRIDAY) datetime.datetime(2020, 3, 27, 8, 55, 7, 615746) [ Next Friday ]
today = today + rld.relativedelta(days=+1) today + rld.relativedelta(weekday=calendar.TUESDAY) datetime.datetime(2020, 3, 24, 8, 55, 7, 615746) [ Next Tuesday ]
Thanks, after reading through that link, it seems that is about extending linguistic terms beyond what is provided by the CLDR json files. It looks to me like the json files from CLDR were last imported to dateparser in 2018 and they seem to have a lot less options for relative terms (in English as well as all languages) than what is currently available. This might aid in fixing the 'next weekday' issue...
Although, supplementing that data with 'weekend' would definitely fall under extending the terms as the files don't seem to cover terms like weekend or perhaps even 'fortnight' used by Aussie's etc...
Ahh wait, now I see that this script looks at the CLDR but only chooses a subset of the available relative terms to transition to dateparser.
https://github.com/scrapinghub/dateparser/blob/master/scripts/get_cldr_data.py
So we might want to extend that subset as needed by changing the download script, and re-running.
Hi everyone, Thanks for looking at this issue. There is no updates since more than one year. Any work around we could use?
my workaround is to load both dateparser and parsedatetime and use the latter when the former fails. :)
This is my work around:
days_long = ['monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']
for day in days_long:
print('trying to find:', day)
if day in time:
print('found', day)
delta = 1
while day not in (datetime.now() + timedelta(days=delta)).strftime('%A').lower():
delta += 1
print('delta:', delta)
if delta > 14: raise # just to make sure
if re.findall(r'\d|noon|midnight', time):
date = (datetime.now() + timedelta(days=delta)).strftime('%Y-%m-%d')
else:
date = daystr((datetime.now() + timedelta(days=delta)))
print('date:', date)
time = time.replace(day, date).replace('next', '')
print('time:', time)
break # only first match
else:
print('not found')
It's janky, but it works.
It parses Tuesday to the date of the Tuesday of the current week but when input is next Tuesday it returns none.